Joey Hong created OPENNLP-861:
---------------------------------

             Summary: Add Chi-Squared Data Indexer for Feature Selection
                 Key: OPENNLP-861
                 URL: https://issues.apache.org/jira/browse/OPENNLP-861
             Project: OpenNLP
          Issue Type: New Feature
          Components: Machine Learning
    Affects Versions: 1.6.0
            Reporter: Joey Hong
            Priority: Minor
             Fix For: 1.6.1


Text classification will naturally produce a lot of features. A lot of them are 
independent of the category, and provide no real information gain in the 
classification.

The Chi-Squared feature selection method will allow features that do not pass a 
threshold for dependency to be removed from the feature list, keeping the 
feature list a reasonable size without significantly affecting the 
classification accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to