Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

[email protected] Mon, 19 Mar 2012 19:52:17 -0700

Looks like we was checking it at the same time.

Yes, because of this log I got confused. Actually this error is caused by
an issue in the corpus. The is not correctly annotated:


... to visit us, <START> Luise <END> and <START> George Bauer <END>.
The <END> was not catch because of the '.' and the corpus parser got
confused.


On Mon, Mar 19, 2012 at 11:41 PM, James Kosin <[email protected]> wrote:

> William,
>
> I found a problem with the longest match... I wasn't jumping over the
> match after adding it to the spans.  This could cause George Bauer and
> Bauer to return as two entries if Bauer was in the dictionary.
>
> I'm a little confused on an output I'm getting now:
> -----
> Running opennlp.tools.namefind.DictionaryNameFinderEvaluatorTest
> Expected: {
> Since then, our guests have to ring at Veilchenstra§e 11 if they want to
> visit us, <START:default> Luise <END> and George Bauer <END>.}
> Predicted: {
> Since then, our guests have to ring at Veilchenstra§e 11 if they want to
> visit us, <START:default> Luise <END> and <START:default> George <END>
> Bauer <END>.}
> False positives: {
> [George]
> } False negatives: {
> []
> }
> ----
> after adding a line assertTrue(fmeasure.getFmeasure() == 1); to the test
> file...?
>
>
>
>
> On 3/19/2012 10:11 PM, William Colen (Commented) (JIRA) wrote:
> >     [
> https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233124#comment-13233124]
> >
> > William Colen commented on OPENNLP-471:
> > ---------------------------------------
> >
> > Ops... sorry ! I am debugging the code and might have found the real
> reason for the previous output. I will investigate it further.
> >
> >> DictionaryNameFinder has HASHing issues
> >> ---------------------------------------
> >>
> >>                 Key: OPENNLP-471
> >>                 URL: https://issues.apache.org/jira/browse/OPENNLP-471
> >>             Project: OpenNLP
> >>          Issue Type: Bug
> >>          Components: Name Finder
> >>            Reporter: James Kosin
> >>            Assignee: James Kosin
> >>              Labels: dictionary, namefinder
> >>             Fix For: tools-1.5.3
> >>
> >>
> >> The DictionaryNameFinder has issues finding multi-token names when the
> dictionary is searched a token at a time by the find() method.  If, the
> dictionary doesn't have a single (or shorter) token match available in the
> dictionary.
> >> Having a dictionary with {"folic", "acid"} without an entry for
> {"folic"} will cause the find() method to totally skip the fact there is a
> longer match possible.
> >> Thanks to Jim for pushing this and to my debugging skills to find.
> >> Two possiblilites come to mind:
> >> 1)  I don't really like, is we turn it into a larger problem by trying
> longer matches when shorter ones don't match.  Unfortunately, this turns
> quickly into a race to see who can wait longer.
> >> 2)  A way of returning a possible match that may need exploring, or a
> look-ahead type system to say we don't match "folic" but if you have "acid"
> after "folic" we have a match for that in the dictionary.
> >> 3)  Leave it as is and modify the dictionary to add shorter terms to
> the dictionary... maybe marking as not-a-valid entry so we can know we need
> a longer match.
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>
>

Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

Reply via email to