W dniu 2013-03-01 11:27, Daniel Naber pisze: > Hi, > > one of the significant sources of false alarms in English is the fact that > LT doesn't properly handle phrases. For example: > > "There are several cargo and passenger ferries." > > leads to an error because only "several cargo" is considered and LT > requires "several" to be followed by a plural noun. Instead, "cargo and > passenger ferries" should be considered one plural noun phrase. > > Has anybody an idea how practical it would be to find these phrases with > disambiguation rules? One could do this (just an example, it doesn't fully > cover the example above): > > <rule id="NNPS_PHRASE1" name="plural noun phrase"> > <pattern> > <marker> > <token postag="NN"></token> > </marker> > <token postag="NNS"></token> > </pattern> > <disambig action="add"><wd pos="NNPS_PHRASE_START"/></disambig> > </rule> > > Then the rules that now look for plural nouns would have to be changed to > look for NNPS_PHRASE_START. > > Is there a way to get "longest match" with disambiguation rules? It seems > to me it's at least difficult to remove shorter phrases inside longer > phrases. > > Any ideas or actual rules for this are very welcome. I think this is one of > the remaining major problems for English (and actually not only English).
Actually, it's possible to build NNP definitions as a cascade of disambiguation rules. There was a successful attempt to do so using Spejd (Polish surface language processor), and Constraint grammar does it as well. The trick is, I believe, to start with longest sequences of nouns, and then try to find smaller. One cannot really find a sequence of nouns and adjectives longer than 9 elements in real language. But before we do so, we need to have a lot of disambiguation between verbs and nouns. We could use Brill tagger rules as inspiration or simply train the Brill tagger on a real corpus (but one that does not have "expected" tags but real ones - Brown corpus sometimes has "he do" as PRP VBZ instead of PRP VBP). See the ideas page, I've written this one up. Best, Marcin ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
