[ 
https://issues.apache.org/jira/browse/STANBOL-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689193#comment-16689193
 ] 

Ayeshmantha commented on STANBOL-320:
-------------------------------------

Any medium that I can have a chat with you about this task would be really 
helpful :) if you don't mind 

> Named Entity detection engine should filter out some obviously wrong text 
> annotations
> -------------------------------------------------------------------------------------
>
>                 Key: STANBOL-320
>                 URL: https://issues.apache.org/jira/browse/STANBOL-320
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancement Engines
>            Reporter: Olivier Grisel
>            Assignee: Rafa Haro
>            Priority: Major
>
> OpenNLP tend to return really weird results from time to time. For instance:
> "The researchers found the liver expresses higher levels of the gene encoding 
> "selenoprotein P" (SEPP1) in people with type 2 diabetes - those with more 
> insulin resistance." outputs a Person TextAnnotation for the mention 'P "' => 
> note the double quote that is included as part the mention and the additional 
> whitespace separator probably inserted by a confused detokenizer.
> Here is another example:
> "We are all very excited for Rahm as he takes on a new challenge for which he 
> is extraordinarily well qualified," said the president. Obama appointed 
> political consultant and senior advisor Pete Rouse as interim chief, calling 
> Rouse "a skillful problem-solver" and a "wise, skillful and long-time 
> counselor." => outputs 'Rouse "' as a Person annotation as well. This is 
> again a confusion with a bad handling of quotation marks.
> I would like to use this jira issue to collect most common annotation mistake 
> that could be filtered using ad-hoc java code directly inside the enhancement 
> engine.
> For the too previous cases, removing the quotation marks and filtering single 
> letter names should be enough. There might be other cases that don't match 
> this simple pattern though. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to