It is catching both names correctly now in the sentence. I've got it all fixed now and in SVN. Could you review.
Thanks James On 3/19/2012 11:25 PM, [email protected] wrote: > I am not sure wich is the best. We just should make sure that it catches > two following names correctly: > > <START> Luise <END> <START> George Bauer <END> > > On Tue, Mar 20, 2012 at 12:07 AM, James Kosin <[email protected]> wrote: > >> Thanks.... >> >> We have some collateral damage for the NameSampleDataStreamTest that is >> failing.... >> I'm trying to fix all this locally as well. yes... >> Although your idea looks better than using getEnd() method I was using >> and checked in... I may change it. >> >> James >> >> On 3/19/2012 10:51 PM, [email protected] wrote: >>> Looks like we was checking it at the same time. >>> >>> Yes, because of this log I got confused. Actually this error is caused by >>> an issue in the corpus. The is not correctly annotated: >>> >>> ... to visit us, <START> Luise <END> and <START> George Bauer <END>. >>> The <END> was not catch because of the '.' and the corpus parser got >>> confused. >>> >>> >>> On Mon, Mar 19, 2012 at 11:41 PM, James Kosin <[email protected]> >> wrote: >>>> William, >>>> >>>> I found a problem with the longest match... I wasn't jumping over the >>>> match after adding it to the spans. This could cause George Bauer and >>>> Bauer to return as two entries if Bauer was in the dictionary. >>>> >>>> I'm a little confused on an output I'm getting now: >>>> ----- >>>> Running opennlp.tools.namefind.DictionaryNameFinderEvaluatorTest >>>> Expected: { >>>> Since then, our guests have to ring at Veilchenstra呈 11 if they want to >>>> visit us, <START:default> Luise <END> and George Bauer <END>.} >>>> Predicted: { >>>> Since then, our guests have to ring at Veilchenstra呈 11 if they want to >>>> visit us, <START:default> Luise <END> and <START:default> George <END> >>>> Bauer <END>.} >>>> False positives: { >>>> [George] >>>> } False negatives: { >>>> [] >>>> } >>>> ---- >>>> after adding a line assertTrue(fmeasure.getFmeasure() == 1); to the test >>>> file...? >>>> >>>> >>>> >>>> >>>> On 3/19/2012 10:11 PM, William Colen (Commented) (JIRA) wrote: >>>>> [ >> https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233124#comment-13233124 >> ] >>>>> William Colen commented on OPENNLP-471: >>>>> --------------------------------------- >>>>> >>>>> Ops... sorry ! I am debugging the code and might have found the real >>>> reason for the previous output. I will investigate it further. >>>>>> DictionaryNameFinder has HASHing issues >>>>>> --------------------------------------- >>>>>> >>>>>> Key: OPENNLP-471 >>>>>> URL: >> https://issues.apache.org/jira/browse/OPENNLP-471 >>>>>> Project: OpenNLP >>>>>> Issue Type: Bug >>>>>> Components: Name Finder >>>>>> Reporter: James Kosin >>>>>> Assignee: James Kosin >>>>>> Labels: dictionary, namefinder >>>>>> Fix For: tools-1.5.3 >>>>>> >>>>>> >>>>>> The DictionaryNameFinder has issues finding multi-token names when the >>>> dictionary is searched a token at a time by the find() method. If, the >>>> dictionary doesn't have a single (or shorter) token match available in >> the >>>> dictionary. >>>>>> Having a dictionary with {"folic", "acid"} without an entry for >>>> {"folic"} will cause the find() method to totally skip the fact there >> is a >>>> longer match possible. >>>>>> Thanks to Jim for pushing this and to my debugging skills to find. >>>>>> Two possiblilites come to mind: >>>>>> 1) I don't really like, is we turn it into a larger problem by trying >>>> longer matches when shorter ones don't match. Unfortunately, this turns >>>> quickly into a race to see who can wait longer. >>>>>> 2) A way of returning a possible match that may need exploring, or a >>>> look-ahead type system to say we don't match "folic" but if you have >> "acid" >>>> after "folic" we have a match for that in the dictionary. >>>>>> 3) Leave it as is and modify the dictionary to add shorter terms to >>>> the dictionary... maybe marking as not-a-valid entry so we can know we >> need >>>> a longer match. >>>>> -- >>>>> This message is automatically generated by JIRA. >>>>> If you think it was sent incorrectly, please contact your JIRA >>>> administrators: >>>> >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >>>>> For more information on JIRA, see: >>>> http://www.atlassian.com/software/jira >>
