Looks like we was checking it at the same time. Yes, because of this log I got confused. Actually this error is caused by an issue in the corpus. The is not correctly annotated:
... to visit us, <START> Luise <END> and <START> George Bauer <END>. The <END> was not catch because of the '.' and the corpus parser got confused. On Mon, Mar 19, 2012 at 11:41 PM, James Kosin <[email protected]> wrote: > William, > > I found a problem with the longest match... I wasn't jumping over the > match after adding it to the spans. This could cause George Bauer and > Bauer to return as two entries if Bauer was in the dictionary. > > I'm a little confused on an output I'm getting now: > ----- > Running opennlp.tools.namefind.DictionaryNameFinderEvaluatorTest > Expected: { > Since then, our guests have to ring at Veilchenstra§e 11 if they want to > visit us, <START:default> Luise <END> and George Bauer <END>.} > Predicted: { > Since then, our guests have to ring at Veilchenstra§e 11 if they want to > visit us, <START:default> Luise <END> and <START:default> George <END> > Bauer <END>.} > False positives: { > [George] > } False negatives: { > [] > } > ---- > after adding a line assertTrue(fmeasure.getFmeasure() == 1); to the test > file...? > > > > > On 3/19/2012 10:11 PM, William Colen (Commented) (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233124#comment-13233124] > > > > William Colen commented on OPENNLP-471: > > --------------------------------------- > > > > Ops... sorry ! I am debugging the code and might have found the real > reason for the previous output. I will investigate it further. > > > >> DictionaryNameFinder has HASHing issues > >> --------------------------------------- > >> > >> Key: OPENNLP-471 > >> URL: https://issues.apache.org/jira/browse/OPENNLP-471 > >> Project: OpenNLP > >> Issue Type: Bug > >> Components: Name Finder > >> Reporter: James Kosin > >> Assignee: James Kosin > >> Labels: dictionary, namefinder > >> Fix For: tools-1.5.3 > >> > >> > >> The DictionaryNameFinder has issues finding multi-token names when the > dictionary is searched a token at a time by the find() method. If, the > dictionary doesn't have a single (or shorter) token match available in the > dictionary. > >> Having a dictionary with {"folic", "acid"} without an entry for > {"folic"} will cause the find() method to totally skip the fact there is a > longer match possible. > >> Thanks to Jim for pushing this and to my debugging skills to find. > >> Two possiblilites come to mind: > >> 1) I don't really like, is we turn it into a larger problem by trying > longer matches when shorter ones don't match. Unfortunately, this turns > quickly into a race to see who can wait longer. > >> 2) A way of returning a possible match that may need exploring, or a > look-ahead type system to say we don't match "folic" but if you have "acid" > after "folic" we have a match for that in the dictionary. > >> 3) Leave it as is and modify the dictionary to add shorter terms to > the dictionary... maybe marking as not-a-valid entry so we can know we need > a longer match. > > -- > > This message is automatically generated by JIRA. > > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > > > > >
