Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

James Kosin Mon, 19 Mar 2012 20:33:15 -0700

It is catching both names correctly now in the sentence.
I've got it all fixed now and in SVN.
Could you review.


Thanks
James

On 3/19/2012 11:25 PM, [email protected] wrote:
> I am not sure wich is the best. We just should make sure that it catches
> two following names correctly:
>
> <START> Luise <END> <START> George Bauer <END>
>
> On Tue, Mar 20, 2012 at 12:07 AM, James Kosin <[email protected]> wrote:
>
>> Thanks....
>>
>> We have some collateral damage for the NameSampleDataStreamTest that is
>> failing....
>> I'm trying to fix all this locally as well. yes...
>> Although your idea looks better than using getEnd() method I was using
>> and checked in... I may change it.
>>
>> James
>>
>> On 3/19/2012 10:51 PM, [email protected] wrote:
>>> Looks like we was checking it at the same time.
>>>
>>> Yes, because of this log I got confused. Actually this error is caused by
>>> an issue in the corpus. The is not correctly annotated:
>>>
>>> ... to visit us, <START> Luise <END> and <START> George Bauer <END>.
>>> The <END> was not catch because of the '.' and the corpus parser got
>>> confused.
>>>
>>>
>>> On Mon, Mar 19, 2012 at 11:41 PM, James Kosin <[email protected]>
>> wrote:
>>>> William,
>>>>
>>>> I found a problem with the longest match... I wasn't jumping over the
>>>> match after adding it to the spans.  This could cause George Bauer and
>>>> Bauer to return as two entries if Bauer was in the dictionary.
>>>>
>>>> I'm a little confused on an output I'm getting now:
>>>> -----
>>>> Running opennlp.tools.namefind.DictionaryNameFinderEvaluatorTest
>>>> Expected: {
>>>> Since then, our guests have to ring at Veilchenstra呈 11 if they want to
>>>> visit us, <START:default> Luise <END> and George Bauer <END>.}
>>>> Predicted: {
>>>> Since then, our guests have to ring at Veilchenstra呈 11 if they want to
>>>> visit us, <START:default> Luise <END> and <START:default> George <END>
>>>> Bauer <END>.}
>>>> False positives: {
>>>> [George]
>>>> } False negatives: {
>>>> []
>>>> }
>>>> ----
>>>> after adding a line assertTrue(fmeasure.getFmeasure() == 1); to the test
>>>> file...?
>>>>
>>>>
>>>>
>>>>
>>>> On 3/19/2012 10:11 PM, William Colen (Commented) (JIRA) wrote:
>>>>>     [
>> https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233124#comment-13233124
>> ]
>>>>> William Colen commented on OPENNLP-471:
>>>>> ---------------------------------------
>>>>>
>>>>> Ops... sorry ! I am debugging the code and might have found the real
>>>> reason for the previous output. I will investigate it further.
>>>>>> DictionaryNameFinder has HASHing issues
>>>>>> ---------------------------------------
>>>>>>
>>>>>>                 Key: OPENNLP-471
>>>>>>                 URL:
>> https://issues.apache.org/jira/browse/OPENNLP-471
>>>>>>             Project: OpenNLP
>>>>>>          Issue Type: Bug
>>>>>>          Components: Name Finder
>>>>>>            Reporter: James Kosin
>>>>>>            Assignee: James Kosin
>>>>>>              Labels: dictionary, namefinder
>>>>>>             Fix For: tools-1.5.3
>>>>>>
>>>>>>
>>>>>> The DictionaryNameFinder has issues finding multi-token names when the
>>>> dictionary is searched a token at a time by the find() method.  If, the
>>>> dictionary doesn't have a single (or shorter) token match available in
>> the
>>>> dictionary.
>>>>>> Having a dictionary with {"folic", "acid"} without an entry for
>>>> {"folic"} will cause the find() method to totally skip the fact there
>> is a
>>>> longer match possible.
>>>>>> Thanks to Jim for pushing this and to my debugging skills to find.
>>>>>> Two possiblilites come to mind:
>>>>>> 1)  I don't really like, is we turn it into a larger problem by trying
>>>> longer matches when shorter ones don't match.  Unfortunately, this turns
>>>> quickly into a race to see who can wait longer.
>>>>>> 2)  A way of returning a possible match that may need exploring, or a
>>>> look-ahead type system to say we don't match "folic" but if you have
>> "acid"
>>>> after "folic" we have a match for that in the dictionary.
>>>>>> 3)  Leave it as is and modify the dictionary to add shorter terms to
>>>> the dictionary... maybe marking as not-a-valid entry so we can know we
>> need
>>>> a longer match.
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> If you think it was sent incorrectly, please contact your JIRA
>>>> administrators:
>>>>
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>>> For more information on JIRA, see:
>>>> http://www.atlassian.com/software/jira
>>

Re: [jira] [Commented] (OPENNLP-471) DictionaryNameFinder has HASHing issues

Reply via email to