[ https://issues.apache.org/jira/browse/STANBOL-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689193#comment-16689193 ]
Ayeshmantha commented on STANBOL-320: ------------------------------------- Any medium that I can have a chat with you about this task would be really helpful :) if you don't mind > Named Entity detection engine should filter out some obviously wrong text > annotations > ------------------------------------------------------------------------------------- > > Key: STANBOL-320 > URL: https://issues.apache.org/jira/browse/STANBOL-320 > Project: Stanbol > Issue Type: Improvement > Components: Enhancement Engines > Reporter: Olivier Grisel > Assignee: Rafa Haro > Priority: Major > > OpenNLP tend to return really weird results from time to time. For instance: > "The researchers found the liver expresses higher levels of the gene encoding > "selenoprotein P" (SEPP1) in people with type 2 diabetes - those with more > insulin resistance." outputs a Person TextAnnotation for the mention 'P "' => > note the double quote that is included as part the mention and the additional > whitespace separator probably inserted by a confused detokenizer. > Here is another example: > "We are all very excited for Rahm as he takes on a new challenge for which he > is extraordinarily well qualified," said the president. Obama appointed > political consultant and senior advisor Pete Rouse as interim chief, calling > Rouse "a skillful problem-solver" and a "wise, skillful and long-time > counselor." => outputs 'Rouse "' as a Person annotation as well. This is > again a confusion with a bad handling of quotation marks. > I would like to use this jira issue to collect most common annotation mistake > that could be filtered using ad-hoc java code directly inside the enhancement > engine. > For the too previous cases, removing the quotation marks and filtering single > letter names should be enough. There might be other cases that don't match > this simple pattern though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)