The university of Nijmegen (The Netherlands) has been working on tool combinations that do exactly that, use machine learning from large corpora, Dutch as well as English.
The Dutch front-end is called valkuil.net. More info on the tools and usage is here: http://webservices-lst.science.ru.nl/ It is a very server-based solution. Ruud On 01-03-13 20:42, Paolo Bianchini wrote: > We are facing the same issue in italian: without understanding the context it > is hard to disambiguate by means of general rules. You need to get to the > level of specific words. > > I came to the conclusion that this problem should be addressed at the tagger > level by providing context based tagging (at least in the first instance). > The tagger should use a large corpus of correct sentences and the relative > tags in order to incorporate a knowledge base. > > Moreover, the tool itself should be able to feed into the corpus additional > correct sentences and learn when needed. > > I understand that a tagger based on simple word lookup is at the base of the > way it works right now, but i don't think that such an implementation > wouldn't be compatible. > > Ciao. > > Paolo > > On 01/mar/2013, at 14:44, "Mike Unwalla" <[email protected]> wrote: > >> Daniel wrote: Has anybody an idea how practical it would be to find these >> [noun] phrases with disambiguation rules? >> >> Probably, you can do it, but a simple rule is unlikely to be sufficient. I >> had a related problem when I wanted to disambiguate nouns and verbs. >> >> The groups of examples that follow show some problems that I had with the >> identification of noun phrases. The target nouns phrases are in CAPITAL >> LETTERS: >> >> SOME THIN OIL FILTERS are not satisfactory. >> SOME THIN OIL filters through the sand. >> >> THE TEMPERATURE INCREASES and decreases are small. >> THE TEMPERATURE increases and the gas expands. >> >> USED PLASTIC COVERS are not satisfactory. >> The technician used PLASTIC COVERS, not metal covers. >> >> The next 3 examples show a semantic problem. Without giving LT information >> about real-world meaning, LT cannot correctly disambiguate the text. >> >> The technician made THE OIL FILTER from a piece of old rag. >> The technician made THE OIL filter into a clean container. >> The technician made THE OIL FILTER into a toy rocket for his 7-year-old son. >> >> To see my rules, look at the rulegroup id="POS_DISAMBIGUATION_IDENTIFY_NOUN" >> in >> www.simplified-english.co.uk/disambiguation-en-asdste100-issue3-2013-02-01.z >> ip. (The rules use new POS, not the default POS in LT.) >> >> Regards, >> >> Mike Unwalla >> Contact: www.techscribe.co.uk/techw/contact.htm >> >> >> -----Original Message----- >> From: Daniel Naber [mailto:[email protected]] >> Sent: 01 March 2013 10:28 >> To: development discussion for LanguageTool >> Subject: finding English phrases >> >> Hi, >> >> one of the significant sources of false alarms in English is the fact that >> LT doesn't properly handle phrases. For example: >> >> "There are several cargo and passenger ferries." >> >> leads to an error because only "several cargo" is considered and LT >> requires "several" to be followed by a plural noun. Instead, "cargo and >> passenger ferries" should be considered one plural noun phrase. >> >> Has anybody an idea how practical it would be to find these phrases with >> disambiguation rules? One could do this (just an example, it doesn't fully >> cover the example above): >> >> <rule id="NNPS_PHRASE1" name="plural noun phrase"> >> <pattern> >> <marker> >> <token postag="NN"></token> >> </marker> >> <token postag="NNS"></token> >> </pattern> >> <disambig action="add"><wd pos="NNPS_PHRASE_START"/></disambig> >> </rule> >> >> Then the rules that now look for plural nouns would have to be changed to >> look for NNPS_PHRASE_START. >> >> Is there a way to get "longest match" with disambiguation rules? It seems >> to me it's at least difficult to remove shorter phrases inside longer >> phrases. >> >> Any ideas or actual rules for this are very welcome. I think this is one of >> the remaining major problems for English (and actually not only English). >> >> Regards >> Daniel >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Languagetool-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
