Joey Hong created OPENNLP-861:
---------------------------------
Summary: Add Chi-Squared Data Indexer for Feature Selection
Key: OPENNLP-861
URL: https://issues.apache.org/jira/browse/OPENNLP-861
Project: OpenNLP
Issue Type: New Feature
Components: Machine Learning
Affects Versions: 1.6.0
Reporter: Joey Hong
Priority: Minor
Fix For: 1.6.1
Text classification will naturally produce a lot of features. A lot of them are
independent of the category, and provide no real information gain in the
classification.
The Chi-Squared feature selection method will allow features that do not pass a
threshold for dependency to be removed from the feature list, keeping the
feature list a reasonable size without significantly affecting the
classification accuracy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)