Hi Dileepa,

Effectively, using OpenNLP NER engine I'm also obtaining a TextAnnotation with the selected-text "Executive Committee of Barclays", although the confidence level is low (0.40117949264383646), a feature that you might want to take into account. Anyway, the only way to improve the OpenNLP NER would be making better NER models and for that you need to provide better and/or bigger training data. If you check the OpenNLP documentation, you will find that the NER model for English has been trained on freely available manually annotated corpora, mainly from news articles. Consequently, the OpenNLP name finder is supposed to work better with "news documents", specially those using linguistic structures similar to the ones used in the training data. Anyway, as any other statistical based tool, it is not going to be never perfect.

Improve name entity recognition is a hard task. The ideal situation is to provide training data for your domain. Also you can try some workarounds, for example mixing/merging the results of different NER engines (OpenNLP, Stanford, Freeling...) or even at Entity Linking level combining the result of Named Entity Linking and Keyword Linking engines.

Cheers,
Rafa

El 27/11/13 11:28, Dileepa Jayakody escribió:
Hi Rafa,

I'm using the default chain;
tika
langdetect
opennlp-sentence
opennlp-token
opennlp-pos     
opennlp-ner
dbpediaLinking
entityhubExtraction

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:

Hi Dileepa,

Are you using only OpenNLP NER engine or are you also including an Entity
Linking engine?


El 27/11/13 11:17, Dileepa Jayakody escribió:

Content:
Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
and Technology Officer. He will join the Executive Committee of Barclays
and report directly to Group Chief Executive Antony Jenkins.

Above content doesn't identify* Barclays* as an organization by
identifies *Executive
Committee of Barclays* as an organization.


How can we improve the accuracy of these results?

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
dileepajayak...@gmail.com

wrote:
[Typo corrected in the subject of the mail]
---------- Forwarded message ----------
From: Dileepa Jayakody <dileepajayak...@gmail.com>
Date: Wed, Nov 27, 2013 at 3:40 PM
Subject: How to refinin NER results in Stanbol
To: Stanbol Dev List <dev@stanbol.apache.org>


Hi All,

I have been running some load tests on Stanbol entity recognition, with a
high load of content extracted from web articles and stored in a Solr
index.

My objective is to achieve an efficient and accurate enhancement result
for the content submitted.

But I think some of the NER results obtained are not accurate.

For an example I submit the content :
Group Finance Director Chris Lucas and Group General Counsel Mark Harding
to retire from Barclays

I get below entity recognition results from default enhancement-chain;

People : Chris Lucas, Mark Harding
Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
Group General Counsel*


The highlighted NERs for organizations above are inaccurate results.
BT Group is not mentioned in the content, and the result : *Finance
Director Chris Lucas and Group General Counsel * is not an organization,

rather a phrase.
Further if I add a fullstop (.) to the end of the sentence "Barclays" is
not recognized as an Organization.

I think we need to improve these results in Stanbol NER. Can we tweak
OpenNLP-NER component for this?

Any ideas/pointers on how to refine these enhancement results will be
immensely helpful.
I'm looking for a way to improve the accuracy of the results as much as
possible.

Thanks,
Dileepa




Reply via email to