Hi all,

we're using parts of LanguageTool to realize a simple lemmatizer.
Basically, we use lang.getTagger().tag(tokenText) to get readings
and then extract the lemma information from there.

For some wordforms, the lemma appears to contain some structuring, e.g.
"besitzt" becomes "[be]sitzen" (the brackets are actually in the string
returned by getLemma).

Are there definite rules for this structure encoding in LanguageTool?
Is there some helper method to strip it from the lemma and get only
the "raw" lemma?

Cheers,

-- Richard

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to