Hello everybody, I may be totally wrong, but I believe the lemmatizers in LanguageTool are implemented based on dictionaries. I suppose a dictionary entry would be made up of a form, a lemma, and a pos tag.
Assuming this is correct, is there a simple way to do a lookup in such a dictionary? Also, is there a way to find out which tagsets are used by these dictionaries (or maybe there is even some standard in LanguageTool, e.g. verbs are always V and nouns are always N or something like that)? I would like a method that accepts an inflected form and a pos tag and that returns a single lemma. Currently, I am doing this, but it seems a bit awkward. List<AnalyzedTokenReadings> rawTaggedTokens = lang.getTagger().tag(tokenText); AnalyzedSentence as = new AnalyzedSentence( rawTaggedTokens.toArray(new AnalyzedTokenReadings[rawTaggedTokens.size()])); as = lang.getDisambiguator().disambiguate(as); String best = getMostFrequentLemma(as.getTokens()[i]); In particular, I would like to use a different POS tagger. I have various statistical POS taggers at my disposal that produce a single POS per token - and that is what I want. The LanguageTool POS tagger produces multiple unranked POS tags per token. Cheers, -- Richard ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel