Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

James Kosin Mon, 19 Mar 2012 20:07:31 -0700

Thanks....

We have some collateral damage for the NameSampleDataStreamTest that is
failing....
I'm trying to fix all this locally as well. yes...
Although your idea looks better than using getEnd() method I was using
and checked in... I may change it.


James

On 3/19/2012 10:51 PM, [email protected] wrote:
> Looks like we was checking it at the same time.
>
> Yes, because of this log I got confused. Actually this error is caused by
> an issue in the corpus. The is not correctly annotated:
>
> ... to visit us, <START> Luise <END> and <START> George Bauer <END>.
> The <END> was not catch because of the '.' and the corpus parser got
> confused.
>
>
> On Mon, Mar 19, 2012 at 11:41 PM, James Kosin <[email protected]> wrote:
>
>> William,
>>
>> I found a problem with the longest match... I wasn't jumping over the
>> match after adding it to the spans.  This could cause George Bauer and
>> Bauer to return as two entries if Bauer was in the dictionary.
>>
>> I'm a little confused on an output I'm getting now:
>> -----
>> Running opennlp.tools.namefind.DictionaryNameFinderEvaluatorTest
>> Expected: {
>> Since then, our guests have to ring at Veilchenstra§e 11 if they want to
>> visit us, <START:default> Luise <END> and George Bauer <END>.}
>> Predicted: {
>> Since then, our guests have to ring at Veilchenstra§e 11 if they want to
>> visit us, <START:default> Luise <END> and <START:default> George <END>
>> Bauer <END>.}
>> False positives: {
>> [George]
>> } False negatives: {
>> []
>> }
>> ----
>> after adding a line assertTrue(fmeasure.getFmeasure() == 1); to the test
>> file...?
>>
>>
>>
>>
>> On 3/19/2012 10:11 PM, William Colen (Commented) (JIRA) wrote:
>>>     [
>> https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233124#comment-13233124]
>>> William Colen commented on OPENNLP-471:
>>> ---------------------------------------
>>>
>>> Ops... sorry ! I am debugging the code and might have found the real
>> reason for the previous output. I will investigate it further.
>>>> DictionaryNameFinder has HASHing issues
>>>> ---------------------------------------
>>>>
>>>>                 Key: OPENNLP-471
>>>>                 URL: https://issues.apache.org/jira/browse/OPENNLP-471
>>>>             Project: OpenNLP
>>>>          Issue Type: Bug
>>>>          Components: Name Finder
>>>>            Reporter: James Kosin
>>>>            Assignee: James Kosin
>>>>              Labels: dictionary, namefinder
>>>>             Fix For: tools-1.5.3
>>>>
>>>>
>>>> The DictionaryNameFinder has issues finding multi-token names when the
>> dictionary is searched a token at a time by the find() method.  If, the
>> dictionary doesn't have a single (or shorter) token match available in the
>> dictionary.
>>>> Having a dictionary with {"folic", "acid"} without an entry for
>> {"folic"} will cause the find() method to totally skip the fact there is a
>> longer match possible.
>>>> Thanks to Jim for pushing this and to my debugging skills to find.
>>>> Two possiblilites come to mind:
>>>> 1)  I don't really like, is we turn it into a larger problem by trying
>> longer matches when shorter ones don't match.  Unfortunately, this turns
>> quickly into a race to see who can wait longer.
>>>> 2)  A way of returning a possible match that may need exploring, or a
>> look-ahead type system to say we don't match "folic" but if you have "acid"
>> after "folic" we have a match for that in the dictionary.
>>>> 3)  Leave it as is and modify the dictionary to add shorter terms to
>> the dictionary... maybe marking as not-a-valid entry so we can know we need
>> a longer match.
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see:
>> http://www.atlassian.com/software/jira
>>>
>>

Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

Reply via email to