Re: How to improve NER results in Stanbol

Dileepa Jayakody Wed, 27 Nov 2013 21:58:37 -0800

Hi Rafa, Rupert,

Thanks a lot for your input. I will look at the options you have suggested.
However, in the first phase of my project I don't require entity-linking
from entity-hub because many of the entities mentioned in the content I
submit will not be available in dbpedia. Therefore currently I also don't
require dbpediaLinking, entityhubExtraction engines in the default chain
I'm using. I will look at implementing a custom-vocab in the second phase
of the project for entity-linking and disambiguation purpose.


At the moment, I focus on improving the accuracy of
named-entity-recognition using NLP techniques. So I think opennlp-chunker
based improvements will be very helpful at this point.

Do you think the accuracy of NER will be improved if I also associate
entitylinking with dbpedia, dbpedia-fst-linking?

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
[email protected]> wrote:

> Hi Dileepa,
>
> I would suggest you also test with a chain that uses Entity Linking
> instead of Named Entity Linking. Have you tried the
> "dbpedia-fst-linking" chain? This one is also configured in the
> default launcher. Please also have a look at STANBOL-1211 [1] that
> brought a lot of improvements for EntityLinking if you include a
> chunker (e.g. the opennlp-chunker) in your chain.
>
> best
> Rupert
>
>
> [1] https://issues.apache.org/jira/browse/STANBOL-1211
>
> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> <[email protected]> wrote:
> > Hi Rafa,
> >
> > I'm using the default chain;
> > tika
> > langdetect
> > opennlp-sentence
> > opennlp-token
> > opennlp-pos
> > opennlp-ner
> > dbpediaLinking
> > entityhubExtraction
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <[email protected]> wrote:
> >
> >> Hi Dileepa,
> >>
> >> Are you using only OpenNLP NER engine or are you also including an
> Entity
> >> Linking engine?
> >>
> >>
> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> >>
> >>> Content:
> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> Operations
> >>> and Technology Officer. He will join the Executive Committee of
> Barclays
> >>> and report directly to Group Chief Executive Antony Jenkins.
> >>>
> >>> Above content doesn't identify* Barclays* as an organization by
> >>> identifies *Executive
> >>> Committee of Barclays* as an organization.
> >>>
> >>>
> >>> How can we improve the accuracy of these results?
> >>>
> >>> Thanks,
> >>> Dileepa
> >>>
> >>>
> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> >>> [email protected]
> >>>
> >>>> wrote:
> >>>> [Typo corrected in the subject of the mail]
> >>>> ---------- Forwarded message ----------
> >>>> From: Dileepa Jayakody <[email protected]>
> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> >>>> Subject: How to refinin NER results in Stanbol
> >>>> To: Stanbol Dev List <[email protected]>
> >>>>
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I have been running some load tests on Stanbol entity recognition,
> with a
> >>>> high load of content extracted from web articles and stored in a Solr
> >>>> index.
> >>>>
> >>>> My objective is to achieve an efficient and accurate enhancement
> result
> >>>> for the content submitted.
> >>>>
> >>>> But I think some of the NER results obtained are not accurate.
> >>>>
> >>>> For an example I submit the content :
> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
> Harding
> >>>> to retire from Barclays
> >>>>
> >>>> I get below entity recognition results from default enhancement-chain;
> >>>>
> >>>> People : Chris Lucas, Mark Harding
> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
> >>>> Group General Counsel*
> >>>>
> >>>>
> >>>> The highlighted NERs for organizations above are inaccurate results.
> >>>> BT Group is not mentioned in the content, and the result : *Finance
> >>>> Director Chris Lucas and Group General Counsel * is not an
> organization,
> >>>>
> >>>> rather a phrase.
> >>>> Further if I add a fullstop (.) to the end of the sentence "Barclays"
> is
> >>>> not recognized as an Organization.
> >>>>
> >>>> I think we need to improve these results in Stanbol NER. Can we tweak
> >>>> OpenNLP-NER component for this?
> >>>>
> >>>> Any ideas/pointers on how to refine these enhancement results will be
> >>>> immensely helpful.
> >>>> I'm looking for a way to improve the accuracy of the results as much
> as
> >>>> possible.
> >>>>
> >>>> Thanks,
> >>>> Dileepa
> >>>>
> >>>>
> >>>>
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: How to improve NER results in Stanbol

Reply via email to