Hello,

W dniu 2014-01-27 15:44, Richard Eckart de Castilho pisze:
> Hello everybody,
>
> I may be totally wrong, but I believe the lemmatizers in LanguageTool are 
> implemented based on dictionaries. I suppose a dictionary entry would be made 
> up of a form, a lemma, and a pos tag.
>
> Assuming this is correct, is there a simple way to do a lookup in such a 
> dictionary?
>
> Also, is there a way to find out which tagsets are used by these dictionaries 
> (or maybe there is even some standard in LanguageTool, e.g. verbs are always 
> V and nouns are always N or something like that)?
>
> I would like a method that accepts an inflected form and a pos tag and that 
> returns a single lemma.
>
>
> Currently, I am doing this, but it seems a bit awkward.
>
> List<AnalyzedTokenReadings> rawTaggedTokens = lang.getTagger().tag(tokenText);
> AnalyzedSentence as = new AnalyzedSentence(
>    rawTaggedTokens.toArray(new 
> AnalyzedTokenReadings[rawTaggedTokens.size()]));
> as = lang.getDisambiguator().disambiguate(as);
> String best = getMostFrequentLemma(as.getTokens()[i]);
>
> In particular, I would like to use a different POS tagger. I have various 
> statistical POS taggers at my disposal that produce a single POS per token - 
> and that is what I want. The LanguageTool POS tagger produces multiple 
> unranked POS tags per token.

Beware that statistical POS taggers will necessarily obfuscate 
non-grammatical material, as they try to guess the correct tags. This 
makes them quite useless for writing rules. We've been there, tried 
that. I haven't yet found a decent English POS tagger, for example, that 
would be useful.

Note however that if you have frequency info, you can add it to your 
tagger dictionary. And we indeed can do so using typing frequency lists, 
so you'd be able to assign the most frequent lemma if you need, I guess. 
The procedure is described here:

http://wiki.languagetool.org/hunspell-support

See under "including frequency data".

Regards,
Marcin

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to