Joern Kottmann created OPENNLP-1261:
---------------------------------------
Summary: Lang Detect fails to predict language on long input texts
Key: OPENNLP-1261
URL: https://issues.apache.org/jira/browse/OPENNLP-1261
Project: OpenNLP
Issue Type: Improvement
Reporter: Joern Kottmann
If the input text is very long, e.g. 100k chars, then the lang detect component
fails to detect the language correctly, even though the text is only written in
one language.
This issue was tracked down to the context generator, where the count of the
ngrams are ignored.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)