Re: How to improve NER results in Stanbol

Dileepa Jayakody Fri, 29 Nov 2013 05:57:24 -0800

Hi Rupert,

Thanks again for your suggestions.
I cloned and build the stanbol-stanfordnlp project above and executed the
run command [1] as below in a separate directory. But the server startup
doesn't complete..it hangs at a point with the log entry : "Reading
TokensRegex rules from
edu/stanford/nlp/models/sutime/english.holidays.sutime.txt"


Any ideas? Can I edit the configurations to skip the above TokenRegex rules
and start the server?

Thanks,
Dileepa

[1]
dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java -Xmx1g
-jar
at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar*
Loading default properties from tagger
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
Reading POS tagger model from
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
... done [2.2 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done
[6.1 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ...
done [4.3 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done
[3.9 sec].
Initialization JollyDayHoliday for sutime
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/english.sutime.txt
Nov 29, 2013 7:14:24 PM
edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Ignoring inactive rule: temporal-composite-8:ranges
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/english.holidays.sutime.txt



On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler <
[email protected]> wrote:

> Hi Dileepa
>
> If you require to detect Entities that are not part of the Controlled
> Vocabularies than there is no way around NER. If you want to have good
> results there will be no way around of building your own models based
> on a custom trainings set.
>
> If you need to detect Persons, Organizations and Places you might have
> a look at Stanford NLP with the Stanbol integration [1]. As the
> Stanford Model provided by Stanford NLP is much better as such of
> OpenNLP.
>
> best
> Rupert
>
>
> [1] https://github.com/westei/stanbol-stanfordnlp
>
> On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
> <[email protected]> wrote:
> > Hi Rafa, Rupert,
> >
> > Thanks a lot for your input. I will look at the options you have
> suggested.
> > However, in the first phase of my project I don't require entity-linking
> > from entity-hub because many of the entities mentioned in the content I
> > submit will not be available in dbpedia. Therefore currently I also don't
> > require dbpediaLinking, entityhubExtraction engines in the default chain
> > I'm using. I will look at implementing a custom-vocab in the second phase
> > of the project for entity-linking and disambiguation purpose.
> >
> > At the moment, I focus on improving the accuracy of
> > named-entity-recognition using NLP techniques. So I think opennlp-chunker
> > based improvements will be very helpful at this point.
> >
> > Do you think the accuracy of NER will be improved if I also associate
> > entitylinking with dbpedia, dbpedia-fst-linking?
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> > [email protected]> wrote:
> >
> >> Hi Dileepa,
> >>
> >> I would suggest you also test with a chain that uses Entity Linking
> >> instead of Named Entity Linking. Have you tried the
> >> "dbpedia-fst-linking" chain? This one is also configured in the
> >> default launcher. Please also have a look at STANBOL-1211 [1] that
> >> brought a lot of improvements for EntityLinking if you include a
> >> chunker (e.g. the opennlp-chunker) in your chain.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/STANBOL-1211
> >>
> >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> >> <[email protected]> wrote:
> >> > Hi Rafa,
> >> >
> >> > I'm using the default chain;
> >> > tika
> >> > langdetect
> >> > opennlp-sentence
> >> > opennlp-token
> >> > opennlp-pos
> >> > opennlp-ner
> >> > dbpediaLinking
> >> > entityhubExtraction
> >> >
> >> > Thanks,
> >> > Dileepa
> >> >
> >> >
> >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <[email protected]> wrote:
> >> >
> >> >> Hi Dileepa,
> >> >>
> >> >> Are you using only OpenNLP NER engine or are you also including an
> >> Entity
> >> >> Linking engine?
> >> >>
> >> >>
> >> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> >> >>
> >> >>> Content:
> >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> >> Operations
> >> >>> and Technology Officer. He will join the Executive Committee of
> >> Barclays
> >> >>> and report directly to Group Chief Executive Antony Jenkins.
> >> >>>
> >> >>> Above content doesn't identify* Barclays* as an organization by
> >> >>> identifies *Executive
> >> >>> Committee of Barclays* as an organization.
> >> >>>
> >> >>>
> >> >>> How can we improve the accuracy of these results?
> >> >>>
> >> >>> Thanks,
> >> >>> Dileepa
> >> >>>
> >> >>>
> >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> >> >>> [email protected]
> >> >>>
> >> >>>> wrote:
> >> >>>> [Typo corrected in the subject of the mail]
> >> >>>> ---------- Forwarded message ----------
> >> >>>> From: Dileepa Jayakody <[email protected]>
> >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> >> >>>> Subject: How to refinin NER results in Stanbol
> >> >>>> To: Stanbol Dev List <[email protected]>
> >> >>>>
> >> >>>>
> >> >>>> Hi All,
> >> >>>>
> >> >>>> I have been running some load tests on Stanbol entity recognition,
> >> with a
> >> >>>> high load of content extracted from web articles and stored in a
> Solr
> >> >>>> index.
> >> >>>>
> >> >>>> My objective is to achieve an efficient and accurate enhancement
> >> result
> >> >>>> for the content submitted.
> >> >>>>
> >> >>>> But I think some of the NER results obtained are not accurate.
> >> >>>>
> >> >>>> For an example I submit the content :
> >> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
> >> Harding
> >> >>>> to retire from Barclays
> >> >>>>
> >> >>>> I get below entity recognition results from default
> enhancement-chain;
> >> >>>>
> >> >>>> People : Chris Lucas, Mark Harding
> >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas
> and
> >> >>>> Group General Counsel*
> >> >>>>
> >> >>>>
> >> >>>> The highlighted NERs for organizations above are inaccurate
> results.
> >> >>>> BT Group is not mentioned in the content, and the result : *Finance
> >> >>>> Director Chris Lucas and Group General Counsel * is not an
> >> organization,
> >> >>>>
> >> >>>> rather a phrase.
> >> >>>> Further if I add a fullstop (.) to the end of the sentence
> "Barclays"
> >> is
> >> >>>> not recognized as an Organization.
> >> >>>>
> >> >>>> I think we need to improve these results in Stanbol NER. Can we
> tweak
> >> >>>> OpenNLP-NER component for this?
> >> >>>>
> >> >>>> Any ideas/pointers on how to refine these enhancement results will
> be
> >> >>>> immensely helpful.
> >> >>>> I'm looking for a way to improve the accuracy of the results as
> much
> >> as
> >> >>>> possible.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Dileepa
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: How to improve NER results in Stanbol

Reply via email to