Hi Dileepa,
Effectively, using OpenNLP NER engine I'm also obtaining a
TextAnnotation with the selected-text "Executive Committee of Barclays",
although the confidence level is low (0.40117949264383646), a feature
that you might want to take into account. Anyway, the only way to
improve the OpenNLP NER would be making better NER models and for that
you need to provide better and/or bigger training data. If you check the
OpenNLP documentation, you will find that the NER model for English has
been trained on freely available manually annotated corpora, mainly from
news articles. Consequently, the OpenNLP name finder is supposed to work
better with "news documents", specially those using linguistic
structures similar to the ones used in the training data. Anyway, as any
other statistical based tool, it is not going to be never perfect.
Improve name entity recognition is a hard task. The ideal situation is
to provide training data for your domain. Also you can try some
workarounds, for example mixing/merging the results of different NER
engines (OpenNLP, Stanford, Freeling...) or even at Entity Linking level
combining the result of Named Entity Linking and Keyword Linking engines.
Cheers,
Rafa
El 27/11/13 11:28, Dileepa Jayakody escribió:
Hi Rafa,
I'm using the default chain;
tika
langdetect
opennlp-sentence
opennlp-token
opennlp-pos
opennlp-ner
dbpediaLinking
entityhubExtraction
Thanks,
Dileepa
On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
Hi Dileepa,
Are you using only OpenNLP NER engine or are you also including an Entity
Linking engine?
El 27/11/13 11:17, Dileepa Jayakody escribió:
Content:
Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
and Technology Officer. He will join the Executive Committee of Barclays
and report directly to Group Chief Executive Antony Jenkins.
Above content doesn't identify* Barclays* as an organization by
identifies *Executive
Committee of Barclays* as an organization.
How can we improve the accuracy of these results?
Thanks,
Dileepa
On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
dileepajayak...@gmail.com
wrote:
[Typo corrected in the subject of the mail]
---------- Forwarded message ----------
From: Dileepa Jayakody <dileepajayak...@gmail.com>
Date: Wed, Nov 27, 2013 at 3:40 PM
Subject: How to refinin NER results in Stanbol
To: Stanbol Dev List <dev@stanbol.apache.org>
Hi All,
I have been running some load tests on Stanbol entity recognition, with a
high load of content extracted from web articles and stored in a Solr
index.
My objective is to achieve an efficient and accurate enhancement result
for the content submitted.
But I think some of the NER results obtained are not accurate.
For an example I submit the content :
Group Finance Director Chris Lucas and Group General Counsel Mark Harding
to retire from Barclays
I get below entity recognition results from default enhancement-chain;
People : Chris Lucas, Mark Harding
Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
Group General Counsel*
The highlighted NERs for organizations above are inaccurate results.
BT Group is not mentioned in the content, and the result : *Finance
Director Chris Lucas and Group General Counsel * is not an organization,
rather a phrase.
Further if I add a fullstop (.) to the end of the sentence "Barclays" is
not recognized as an Organization.
I think we need to improve these results in Stanbol NER. Can we tweak
OpenNLP-NER component for this?
Any ideas/pointers on how to refine these enhancement results will be
immensely helpful.
I'm looking for a way to improve the accuracy of the results as much as
possible.
Thanks,
Dileepa