[Typo corrected in the subject of the mail] ---------- Forwarded message ---------- From: Dileepa Jayakody <dileepajayak...@gmail.com> Date: Wed, Nov 27, 2013 at 3:40 PM Subject: How to refinin NER results in Stanbol To: Stanbol Dev List <dev@stanbol.apache.org>
Hi All, I have been running some load tests on Stanbol entity recognition, with a high load of content extracted from web articles and stored in a Solr index. My objective is to achieve an efficient and accurate enhancement result for the content submitted. But I think some of the NER results obtained are not accurate. For an example I submit the content : Group Finance Director Chris Lucas and Group General Counsel Mark Harding to retire from Barclays I get below entity recognition results from default enhancement-chain; People : Chris Lucas, Mark Harding Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and Group General Counsel* The highlighted NERs for organizations above are inaccurate results. BT Group is not mentioned in the content, and the result : *Finance Director Chris Lucas and Group General Counsel * is not an organization, rather a phrase. Further if I add a fullstop (.) to the end of the sentence "Barclays" is not recognized as an Organization. I think we need to improve these results in Stanbol NER. Can we tweak OpenNLP-NER component for this? Any ideas/pointers on how to refine these enhancement results will be immensely helpful. I'm looking for a way to improve the accuracy of the results as much as possible. Thanks, Dileepa