Hi all, we're using parts of LanguageTool to realize a simple lemmatizer. Basically, we use lang.getTagger().tag(tokenText) to get readings and then extract the lemma information from there.
For some wordforms, the lemma appears to contain some structuring, e.g. "besitzt" becomes "[be]sitzen" (the brackets are actually in the string returned by getLemma). Are there definite rules for this structure encoding in LanguageTool? Is there some helper method to strip it from the lemma and get only the "raw" lemma? Cheers, -- Richard ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel