[jira] [Commented] (OPENNLP-78) NameFinder and Dictionary Integration

Jim Piliouras (Commented) (JIRA) Wed, 14 Mar 2012 11:03:02 -0700

    [ 
https://issues.apache.org/jira/browse/OPENNLP-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229441#comment-13229441
 ]


Jim Piliouras commented on OPENNLP-78:
--------------------------------------

Actually, it sounds a lot easier if one has the option  to use the dictionary 
post model creation just for evaluation purposes (as James stated) without 
touching the feature generation. It does sound significantly less work but will 
boost the results of people who actually have dictionaries (like me). I do get 
the point about training more on surrounding tokens but again you can never be 
sure what to expect from a corpus. Sometimes it might be good sometimes it 
might be bad...For example i'm dealing with drug names that exhibit very strong 
morphological characteristics most of the time. Some of them are so strong and 
unique that you can find them using regex. This leads to very informative 
features doesn't it? That is why i'm getting such good results, in spite of not 
having the recommended amount for training (i only have 3,800 sentences). I 
guess learning most features from the entity itself works really well for me 
but what would happen if was looking for person names with such little training 
data? I really wonder...I can see that your pre-trained model for names is 5MB 
whereas my drug model is only 387Kb and still gets precision 94% and recall 
73%. Anyway i vote for using the dictionary  after deploying the maxent model 
for the sake of better results when evaluating...

Hope I didn't bore you!

Jim 
                
> NameFinder and Dictionary Integration
> -------------------------------------
>
>                 Key: OPENNLP-78
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-78
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>         Environment: Windows 7
>            Reporter: James Kosin
>            Assignee: James Kosin
>            Priority: Minor
>
> Now that we have a NameFinder Dictionary and improved NameFinder tools; it 
> would be nice to be able to integrate the dictionary and model to help 
> improve the finding of names.
> This way, the name finder could be trained more on the surrounding text 
> instead of attempting to memorize common names in the news that occur 
> frequently.
> I've already got the name finder corpus, created the dictionaries with the 
> data from the US Census.
> I just need to implement some method to help train the model; or be able to 
> use the dictionaries post model creation to help with the finding of names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-78) NameFinder and Dictionary Integration

Reply via email to