Rodrigo Agerri created OPENNLP-715:
--------------------------------------
Summary: Clark clusters NameFinder features
Key: OPENNLP-715
URL: https://issues.apache.org/jira/browse/OPENNLP-715
Project: OpenNLP
Issue Type: New Feature
Components: Name Finder
Affects Versions: 1.6.0
Reporter: Rodrigo Agerri
Assignee: Rodrigo Agerri
Priority: Minor
Fix For: 1.6.0
Add token based features from Clark clusters (Clark 2003). This feature is
actually the same as the one implemented in the WordClusterFeatureGenerator,
but we should somehow make them separate (perhaps implementing a dynamic prefix
id for each one, as in the dictionary features) as it has been shown that the
combination of these clustering-based features improve results.
Clark clusters can be generated using this tool:
https://github.com/ninjin/clark_pos_induction
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)