W dniu 2014-01-02 11:45, Kumara Bhikkhu pisze: > I realise that the pattern needs to be better > defined. But lots of funny things come up as > well. E.g., besides "or", even "road" "bullfrog" > are adjectives. And lots of nouns that are not > within NN:U|NN:UN are flagged too. Surely people > will complain about it, but these aren't caused of the rule pattern.
Actually, "road" can be used as an adjective, for example in "road sign". Also, any connective, including "or" can be used as a noun ("No buts", "I'm fed up with your ifs and ors"), even if rarely. > Could it be worth investigating? The pattern you created might be useful if it simply lists only sure cases of wordiness. Simply use all the words you think are redundant phrases; on the other hand, adjective + "in" + noun is perfect English grammar. So I'd go for <pattern> <token>big</token> <token>in</token> <token>size</token> <pattern> etc. Lots of patterns to create, sure, but a lot of more useful. Disambiguation should relate to cases when we really have errors, and it's sometimes quite unclear without the context that there is an error in our lexicon. In most cases, it's just we don't remove interpretations that we should. Best, Marcin > > kb > > Marcin Miłkowski wrote thus at 06:00 PM 02-01-14: >> W dniu 2014-01-01 11:55, Kumara Bhikkhu pisze: >>> Daniel Naber wrote thus at 06:07 PM 01-01-14: >>>> On 2013-12-31 07:18, Kumara Bhikkhu wrote: >>>> >>>>> I found a strangely flagged string: "or in heaven". It's by one of my >>>>> test rules. >>>> >>>> Thanks, I have fixed that by adding a rule to disambiguation.xml. I have >>>> also removed the NN:U reading which doesn't seem useful for lowercase >>>> "or". Not sure if using disambiguation makes sense, but this way we >>>> don't have to touch the binary dictionary for such a small change. >>> >>> Well, I could just add "or" as an exception. It's weird how "or" got >>> into the adjective category. >>> >>> JFYI, this is a rule for the common tendency by academics to say "big >>> in size", "red in color", "few in number", etc. It's less pompous to >>> just say "big", "red", or "few". I think it's good enough now. Shall >>> send it to you privately. >> >> The rule is right now creating only false alarms, and none of the >> matches is actually an error (some of them match because of the lack of >> disambiguation but most of them match correctly): >> >> https://languagetool.org/regression-tests/20140101/result_en_20140101.html >> >> If you want to match "big in size", then it's best to use these tokens, >> and not a sequence of POS tags, as POS tags will match correct phrases. >> >> I will remove this rule because it's not useful at all in its current >> version, unfortunately, and it will only cause people to complain. >> >> Best, >> Marcin > > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel