[
https://issues.apache.org/jira/browse/OPENNLP-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229441#comment-13229441
]
Jim Piliouras commented on OPENNLP-78:
--------------------------------------
Actually, it sounds a lot easier if one has the option to use the dictionary
post model creation just for evaluation purposes (as James stated) without
touching the feature generation. It does sound significantly less work but will
boost the results of people who actually have dictionaries (like me). I do get
the point about training more on surrounding tokens but again you can never be
sure what to expect from a corpus. Sometimes it might be good sometimes it
might be bad...For example i'm dealing with drug names that exhibit very strong
morphological characteristics most of the time. Some of them are so strong and
unique that you can find them using regex. This leads to very informative
features doesn't it? That is why i'm getting such good results, in spite of not
having the recommended amount for training (i only have 3,800 sentences). I
guess learning most features from the entity itself works really well for me
but what would happen if was looking for person names with such little training
data? I really wonder...I can see that your pre-trained model for names is 5MB
whereas my drug model is only 387Kb and still gets precision 94% and recall
73%. Anyway i vote for using the dictionary after deploying the maxent model
for the sake of better results when evaluating...
Hope I didn't bore you!
Jim
> NameFinder and Dictionary Integration
> -------------------------------------
>
> Key: OPENNLP-78
> URL: https://issues.apache.org/jira/browse/OPENNLP-78
> Project: OpenNLP
> Issue Type: New Feature
> Components: Name Finder
> Environment: Windows 7
> Reporter: James Kosin
> Assignee: James Kosin
> Priority: Minor
>
> Now that we have a NameFinder Dictionary and improved NameFinder tools; it
> would be nice to be able to integrate the dictionary and model to help
> improve the finding of names.
> This way, the name finder could be trained more on the surrounding text
> instead of attempting to memorize common names in the news that occur
> frequently.
> I've already got the name finder corpus, created the dictionaries with the
> data from the US Census.
> I just need to implement some method to help train the model; or be able to
> use the dictionaries post model creation to help with the finding of names.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira