Daniel wrote: Has anybody an idea how practical it would be to find these [noun] phrases with disambiguation rules?
Probably, you can do it, but a simple rule is unlikely to be sufficient. I had a related problem when I wanted to disambiguate nouns and verbs. The groups of examples that follow show some problems that I had with the identification of noun phrases. The target nouns phrases are in CAPITAL LETTERS: SOME THIN OIL FILTERS are not satisfactory. SOME THIN OIL filters through the sand. THE TEMPERATURE INCREASES and decreases are small. THE TEMPERATURE increases and the gas expands. USED PLASTIC COVERS are not satisfactory. The technician used PLASTIC COVERS, not metal covers. The next 3 examples show a semantic problem. Without giving LT information about real-world meaning, LT cannot correctly disambiguate the text. The technician made THE OIL FILTER from a piece of old rag. The technician made THE OIL filter into a clean container. The technician made THE OIL FILTER into a toy rocket for his 7-year-old son. To see my rules, look at the rulegroup id="POS_DISAMBIGUATION_IDENTIFY_NOUN" in www.simplified-english.co.uk/disambiguation-en-asdste100-issue3-2013-02-01.z ip. (The rules use new POS, not the default POS in LT.) Regards, Mike Unwalla Contact: www.techscribe.co.uk/techw/contact.htm -----Original Message----- From: Daniel Naber [mailto:[email protected]] Sent: 01 March 2013 10:28 To: development discussion for LanguageTool Subject: finding English phrases Hi, one of the significant sources of false alarms in English is the fact that LT doesn't properly handle phrases. For example: "There are several cargo and passenger ferries." leads to an error because only "several cargo" is considered and LT requires "several" to be followed by a plural noun. Instead, "cargo and passenger ferries" should be considered one plural noun phrase. Has anybody an idea how practical it would be to find these phrases with disambiguation rules? One could do this (just an example, it doesn't fully cover the example above): <rule id="NNPS_PHRASE1" name="plural noun phrase"> <pattern> <marker> <token postag="NN"></token> </marker> <token postag="NNS"></token> </pattern> <disambig action="add"><wd pos="NNPS_PHRASE_START"/></disambig> </rule> Then the rules that now look for plural nouns would have to be changed to look for NNPS_PHRASE_START. Is there a way to get "longest match" with disambiguation rules? It seems to me it's at least difficult to remove shorter phrases inside longer phrases. Any ideas or actual rules for this are very welcome. I think this is one of the remaining major problems for English (and actually not only English). Regards Daniel ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
