Re: How to improve NER results in Stanbol

Rupert Westenthaler Thu, 28 Nov 2013 22:19:58 -0800

Hi Dileepa

If you require to detect Entities that are not part of the Controlled
Vocabularies than there is no way around NER. If you want to have good
results there will be no way around of building your own models based
on a custom trainings set.


If you need to detect Persons, Organizations and Places you might have
a look at Stanford NLP with the Stanbol integration [1]. As the
Stanford Model provided by Stanford NLP is much better as such of
OpenNLP.

best
Rupert


[1] https://github.com/westei/stanbol-stanfordnlp

On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
<[email protected]> wrote:
> Hi Rafa, Rupert,
>
> Thanks a lot for your input. I will look at the options you have suggested.
> However, in the first phase of my project I don't require entity-linking
> from entity-hub because many of the entities mentioned in the content I
> submit will not be available in dbpedia. Therefore currently I also don't
> require dbpediaLinking, entityhubExtraction engines in the default chain
> I'm using. I will look at implementing a custom-vocab in the second phase
> of the project for entity-linking and disambiguation purpose.
>
> At the moment, I focus on improving the accuracy of
> named-entity-recognition using NLP techniques. So I think opennlp-chunker
> based improvements will be very helpful at this point.
>
> Do you think the accuracy of NER will be improved if I also associate
> entitylinking with dbpedia, dbpedia-fst-linking?
>
> Thanks,
> Dileepa
>
>
> On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi Dileepa,
>>
>> I would suggest you also test with a chain that uses Entity Linking
>> instead of Named Entity Linking. Have you tried the
>> "dbpedia-fst-linking" chain? This one is also configured in the
>> default launcher. Please also have a look at STANBOL-1211 [1] that
>> brought a lot of improvements for EntityLinking if you include a
>> chunker (e.g. the opennlp-chunker) in your chain.
>>
>> best
>> Rupert
>>
>>
>> [1] https://issues.apache.org/jira/browse/STANBOL-1211
>>
>> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
>> <[email protected]> wrote:
>> > Hi Rafa,
>> >
>> > I'm using the default chain;
>> > tika
>> > langdetect
>> > opennlp-sentence
>> > opennlp-token
>> > opennlp-pos
>> > opennlp-ner
>> > dbpediaLinking
>> > entityhubExtraction
>> >
>> > Thanks,
>> > Dileepa
>> >
>> >
>> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <[email protected]> wrote:
>> >
>> >> Hi Dileepa,
>> >>
>> >> Are you using only OpenNLP NER engine or are you also including an
>> Entity
>> >> Linking engine?
>> >>
>> >>
>> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
>> >>
>> >>> Content:
>> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
>> Operations
>> >>> and Technology Officer. He will join the Executive Committee of
>> Barclays
>> >>> and report directly to Group Chief Executive Antony Jenkins.
>> >>>
>> >>> Above content doesn't identify* Barclays* as an organization by
>> >>> identifies *Executive
>> >>> Committee of Barclays* as an organization.
>> >>>
>> >>>
>> >>> How can we improve the accuracy of these results?
>> >>>
>> >>> Thanks,
>> >>> Dileepa
>> >>>
>> >>>
>> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
>> >>> [email protected]
>> >>>
>> >>>> wrote:
>> >>>> [Typo corrected in the subject of the mail]
>> >>>> ---------- Forwarded message ----------
>> >>>> From: Dileepa Jayakody <[email protected]>
>> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
>> >>>> Subject: How to refinin NER results in Stanbol
>> >>>> To: Stanbol Dev List <[email protected]>
>> >>>>
>> >>>>
>> >>>> Hi All,
>> >>>>
>> >>>> I have been running some load tests on Stanbol entity recognition,
>> with a
>> >>>> high load of content extracted from web articles and stored in a Solr
>> >>>> index.
>> >>>>
>> >>>> My objective is to achieve an efficient and accurate enhancement
>> result
>> >>>> for the content submitted.
>> >>>>
>> >>>> But I think some of the NER results obtained are not accurate.
>> >>>>
>> >>>> For an example I submit the content :
>> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
>> Harding
>> >>>> to retire from Barclays
>> >>>>
>> >>>> I get below entity recognition results from default enhancement-chain;
>> >>>>
>> >>>> People : Chris Lucas, Mark Harding
>> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>> >>>> Group General Counsel*
>> >>>>
>> >>>>
>> >>>> The highlighted NERs for organizations above are inaccurate results.
>> >>>> BT Group is not mentioned in the content, and the result : *Finance
>> >>>> Director Chris Lucas and Group General Counsel * is not an
>> organization,
>> >>>>
>> >>>> rather a phrase.
>> >>>> Further if I add a fullstop (.) to the end of the sentence "Barclays"
>> is
>> >>>> not recognized as an Organization.
>> >>>>
>> >>>> I think we need to improve these results in Stanbol NER. Can we tweak
>> >>>> OpenNLP-NER component for this?
>> >>>>
>> >>>> Any ideas/pointers on how to refine these enhancement results will be
>> >>>> immensely helpful.
>> >>>> I'm looking for a way to improve the accuracy of the results as much
>> as
>> >>>> possible.
>> >>>>
>> >>>> Thanks,
>> >>>> Dileepa
>> >>>>
>> >>>>
>> >>>>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: How to improve NER results in Stanbol

Reply via email to