Rodrigo Agerri created OPENNLP-715:
--------------------------------------

             Summary: Clark clusters NameFinder features
                 Key: OPENNLP-715
                 URL: https://issues.apache.org/jira/browse/OPENNLP-715
             Project: OpenNLP
          Issue Type: New Feature
          Components: Name Finder
    Affects Versions: 1.6.0
            Reporter: Rodrigo Agerri
            Assignee: Rodrigo Agerri
            Priority: Minor
             Fix For: 1.6.0


Add token based features from Clark clusters (Clark 2003). This feature is 
actually the same as the one implemented in the WordClusterFeatureGenerator, 
but we should somehow make them separate (perhaps implementing a dynamic prefix 
id for each one, as in the dictionary features) as it has been shown that the 
combination of these clustering-based features improve results. 

Clark clusters can be generated using this tool: 

https://github.com/ninjin/clark_pos_induction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to