How to use the lemmatizer

Richard Eckart de Castilho Mon, 27 Jan 2014 06:45:07 -0800

Hello everybody,

I may be totally wrong, but I believe the lemmatizers in LanguageTool are 
implemented based on dictionaries. I suppose a dictionary entry would be made 
up of a form, a lemma, and a pos tag.


Assuming this is correct, is there a simple way to do a lookup in such a 
dictionary? 

Also, is there a way to find out which tagsets are used by these dictionaries 
(or maybe there is even some standard in LanguageTool, e.g. verbs are always V 
and nouns are always N or something like that)?

I would like a method that accepts an inflected form and a pos tag and that 
returns a single lemma.


Currently, I am doing this, but it seems a bit awkward.

List<AnalyzedTokenReadings> rawTaggedTokens = lang.getTagger().tag(tokenText);
AnalyzedSentence as = new AnalyzedSentence(
  rawTaggedTokens.toArray(new AnalyzedTokenReadings[rawTaggedTokens.size()]));
as = lang.getDisambiguator().disambiguate(as);
String best = getMostFrequentLemma(as.getTokens()[i]);

In particular, I would like to use a different POS tagger. I have various 
statistical POS taggers at my disposal that produce a single POS per token - 
and that is what I want. The LanguageTool POS tagger produces multiple unranked 
POS tags per token.

Cheers,

-- Richard
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

How to use the lemmatizer

Reply via email to