Re: How to improve NER results in Stanbol

Rafa Haro Wed, 27 Nov 2013 03:03:48 -0800

Hi Dileepa,

Effectively, using OpenNLP NER engine I'm also obtaining aTextAnnotation with the selected-text "Executive Committee of Barclays",although the confidence level is low (0.40117949264383646), a featurethat you might want to take into account. Anyway, the only way toimprove the OpenNLP NER would be making better NER models and for thatyou need to provide better and/or bigger training data. If you check theOpenNLP documentation, you will find that the NER model for English hasbeen trained on freely available manually annotated corpora, mainly fromnews articles. Consequently, the OpenNLP name finder is supposed to workbetter with "news documents", specially those using linguisticstructures similar to the ones used in the training data. Anyway, as anyother statistical based tool, it is not going to be never perfect.

Improve name entity recognition is a hard task. The ideal situation isto provide training data for your domain. Also you can try someworkarounds, for example mixing/merging the results of different NERengines (OpenNLP, Stanford, Freeling...) or even at Entity Linking levelcombining the result of Named Entity Linking and Keyword Linking engines.


Cheers,
Rafa

El 27/11/13 11:28, Dileepa Jayakody escribió:

Hi Rafa,

I'm using the default chain;
tika
langdetect
opennlp-sentence
opennlp-token
opennlp-pos     
opennlp-ner
dbpediaLinking
entityhubExtraction

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <[email protected]> wrote:

Hi Dileepa,

Are you using only OpenNLP NER engine or are you also including an Entity
Linking engine?


El 27/11/13 11:17, Dileepa Jayakody escribió:

Content:
Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
and Technology Officer. He will join the Executive Committee of Barclays
and report directly to Group Chief Executive Antony Jenkins.

Above content doesn't identify* Barclays* as an organization by
identifies *Executive
Committee of Barclays* as an organization.


How can we improve the accuracy of these results?

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
[email protected]

wrote:
[Typo corrected in the subject of the mail]
---------- Forwarded message ----------
From: Dileepa Jayakody <[email protected]>
Date: Wed, Nov 27, 2013 at 3:40 PM
Subject: How to refinin NER results in Stanbol
To: Stanbol Dev List <[email protected]>

Hi All,

I have been running some load tests on Stanbol entity recognition, with a
high load of content extracted from web articles and stored in a Solr
index.

My objective is to achieve an efficient and accurate enhancement result
for the content submitted.

But I think some of the NER results obtained are not accurate.

For an example I submit the content :
Group Finance Director Chris Lucas and Group General Counsel Mark Harding
to retire from Barclays

I get below entity recognition results from default enhancement-chain;

People : Chris Lucas, Mark Harding
Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
Group General Counsel*

The highlighted NERs for organizations above are inaccurate results.
BT Group is not mentioned in the content, and the result : *Finance
Director Chris Lucas and Group General Counsel * is not an organization,

rather a phrase.
Further if I add a fullstop (.) to the end of the sentence "Barclays" is
not recognized as an Organization.

I think we need to improve these results in Stanbol NER. Can we tweak
OpenNLP-NER component for this?

Any ideas/pointers on how to refine these enhancement results will be
immensely helpful.
I'm looking for a way to improve the accuracy of the results as much as
possible.

Thanks,
Dileepa

Re: How to improve NER results in Stanbol

Reply via email to