[ https://issues.apache.org/jira/browse/OPENNLP-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Wiesner closed OPENNLP-327. ---------------------------------- Resolution: Delivered > Doccats bag of word feature generator should not use numbers as features > ------------------------------------------------------------------------ > > Key: OPENNLP-327 > URL: https://issues.apache.org/jira/browse/OPENNLP-327 > Project: OpenNLP > Issue Type: Improvement > Components: Doccat > Reporter: Jörn Kottmann > Assignee: Jörn Kottmann > Priority: Minor > > It turned out that Doccats bag of word feature generator can be very > sensitive to numbers when used for language identification. Therefore numbers > should not be included in the bag of words. -- This message was sent by Atlassian Jira (v8.20.10#820010)