Re: How to improve NER results in Stanbol

Dileepa Jayakody Fri, 29 Nov 2013 08:49:42 -0800

Hi Cristian,

Thanks a lot, yes the server is up and running and I can access the server
main page in the browser too.
I was expecting a log entry on server startup indication...sorry for the
false alarm guys.


BTW I think it will be good to give a log entry on successful server
startup completion.

Thanks,
Dileepa


On Fri, Nov 29, 2013 at 8:46 PM, Cristian Petroaca <
cristian.petro...@gmail.com> wrote:

> Hi Dileepa,
>
> I've played with the Stanbol Stanford NLP project for a little while. From
> what I saw that is normal output, it means the server is up and running.
>
> If you issue a command like "curl -X POST -H "Content-Type: text/plain" -H
> "Content-Language: en" --data "[YOUR TEXT]" http://localhost:
> [PORT]/analysis"
> , replacing [YOUR_TEXT] and [PORT] accordingly you should see it work
> giving you a json output.
>
> Regards,
> Cristian
>
>
> 2013/11/29 Dileepa Jayakody <dileepajayak...@gmail.com>
>
> > Hi Rupert,
> >
> > Thanks again for your suggestions.
> > I cloned and build the stanbol-stanfordnlp project above and executed the
> > run command [1] as below in a separate directory. But the server startup
> > doesn't complete..it hangs at a point with the log entry : "Reading
> > TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt"
> >
> > Any ideas? Can I edit the configurations to skip the above TokenRegex
> rules
> > and start the server?
> >
> > Thanks,
> > Dileepa
> >
> > [1]
> > dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java
> -Xmx1g
> > -jar
> >
> >
> at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar*
> > Loading default properties from tagger
> >
> >
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> > Reading POS tagger model from
> >
> >
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> > ... done [2.2 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ...
> done
> > [6.1 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ...
> > done [4.3 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ...
> done
> > [3.9 sec].
> > Initialization JollyDayHoliday for sutime
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/defs.sutime.txt
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.sutime.txt
> > Nov 29, 2013 7:14:24 PM
> > edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
> > INFO: Ignoring inactive rule: temporal-composite-8:ranges
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
> >
> >
> >
> > On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler <
> > rupert.westentha...@gmail.com> wrote:
> >
> > > Hi Dileepa
> > >
> > > If you require to detect Entities that are not part of the Controlled
> > > Vocabularies than there is no way around NER. If you want to have good
> > > results there will be no way around of building your own models based
> > > on a custom trainings set.
> > >
> > > If you need to detect Persons, Organizations and Places you might have
> > > a look at Stanford NLP with the Stanbol integration [1]. As the
> > > Stanford Model provided by Stanford NLP is much better as such of
> > > OpenNLP.
> > >
> > > best
> > > Rupert
> > >
> > >
> > > [1] https://github.com/westei/stanbol-stanfordnlp
> > >
> > > On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
> > > <dileepajayak...@gmail.com> wrote:
> > > > Hi Rafa, Rupert,
> > > >
> > > > Thanks a lot for your input. I will look at the options you have
> > > suggested.
> > > > However, in the first phase of my project I don't require
> > entity-linking
> > > > from entity-hub because many of the entities mentioned in the
> content I
> > > > submit will not be available in dbpedia. Therefore currently I also
> > don't
> > > > require dbpediaLinking, entityhubExtraction engines in the default
> > chain
> > > > I'm using. I will look at implementing a custom-vocab in the second
> > phase
> > > > of the project for entity-linking and disambiguation purpose.
> > > >
> > > > At the moment, I focus on improving the accuracy of
> > > > named-entity-recognition using NLP techniques. So I think
> > opennlp-chunker
> > > > based improvements will be very helpful at this point.
> > > >
> > > > Do you think the accuracy of NER will be improved if I also associate
> > > > entitylinking with dbpedia, dbpedia-fst-linking?
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> > > > rupert.westentha...@gmail.com> wrote:
> > > >
> > > >> Hi Dileepa,
> > > >>
> > > >> I would suggest you also test with a chain that uses Entity Linking
> > > >> instead of Named Entity Linking. Have you tried the
> > > >> "dbpedia-fst-linking" chain? This one is also configured in the
> > > >> default launcher. Please also have a look at STANBOL-1211 [1] that
> > > >> brought a lot of improvements for EntityLinking if you include a
> > > >> chunker (e.g. the opennlp-chunker) in your chain.
> > > >>
> > > >> best
> > > >> Rupert
> > > >>
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/STANBOL-1211
> > > >>
> > > >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> > > >> <dileepajayak...@gmail.com> wrote:
> > > >> > Hi Rafa,
> > > >> >
> > > >> > I'm using the default chain;
> > > >> > tika
> > > >> > langdetect
> > > >> > opennlp-sentence
> > > >> > opennlp-token
> > > >> > opennlp-pos
> > > >> > opennlp-ner
> > > >> > dbpediaLinking
> > > >> > entityhubExtraction
> > > >> >
> > > >> > Thanks,
> > > >> > Dileepa
> > > >> >
> > > >> >
> > > >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org>
> > wrote:
> > > >> >
> > > >> >> Hi Dileepa,
> > > >> >>
> > > >> >> Are you using only OpenNLP NER engine or are you also including
> an
> > > >> Entity
> > > >> >> Linking engine?
> > > >> >>
> > > >> >>
> > > >> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> > > >> >>
> > > >> >>> Content:
> > > >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> > > >> Operations
> > > >> >>> and Technology Officer. He will join the Executive Committee of
> > > >> Barclays
> > > >> >>> and report directly to Group Chief Executive Antony Jenkins.
> > > >> >>>
> > > >> >>> Above content doesn't identify* Barclays* as an organization by
> > > >> >>> identifies *Executive
> > > >> >>> Committee of Barclays* as an organization.
> > > >> >>>
> > > >> >>>
> > > >> >>> How can we improve the accuracy of these results?
> > > >> >>>
> > > >> >>> Thanks,
> > > >> >>> Dileepa
> > > >> >>>
> > > >> >>>
> > > >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> > > >> >>> dileepajayak...@gmail.com
> > > >> >>>
> > > >> >>>> wrote:
> > > >> >>>> [Typo corrected in the subject of the mail]
> > > >> >>>> ---------- Forwarded message ----------
> > > >> >>>> From: Dileepa Jayakody <dileepajayak...@gmail.com>
> > > >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> > > >> >>>> Subject: How to refinin NER results in Stanbol
> > > >> >>>> To: Stanbol Dev List <dev@stanbol.apache.org>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> Hi All,
> > > >> >>>>
> > > >> >>>> I have been running some load tests on Stanbol entity
> > recognition,
> > > >> with a
> > > >> >>>> high load of content extracted from web articles and stored in
> a
> > > Solr
> > > >> >>>> index.
> > > >> >>>>
> > > >> >>>> My objective is to achieve an efficient and accurate
> enhancement
> > > >> result
> > > >> >>>> for the content submitted.
> > > >> >>>>
> > > >> >>>> But I think some of the NER results obtained are not accurate.
> > > >> >>>>
> > > >> >>>> For an example I submit the content :
> > > >> >>>> Group Finance Director Chris Lucas and Group General Counsel
> Mark
> > > >> Harding
> > > >> >>>> to retire from Barclays
> > > >> >>>>
> > > >> >>>> I get below entity recognition results from default
> > > enhancement-chain;
> > > >> >>>>
> > > >> >>>> People : Chris Lucas, Mark Harding
> > > >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris
> Lucas
> > > and
> > > >> >>>> Group General Counsel*
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> The highlighted NERs for organizations above are inaccurate
> > > results.
> > > >> >>>> BT Group is not mentioned in the content, and the result :
> > *Finance
> > > >> >>>> Director Chris Lucas and Group General Counsel * is not an
> > > >> organization,
> > > >> >>>>
> > > >> >>>> rather a phrase.
> > > >> >>>> Further if I add a fullstop (.) to the end of the sentence
> > > "Barclays"
> > > >> is
> > > >> >>>> not recognized as an Organization.
> > > >> >>>>
> > > >> >>>> I think we need to improve these results in Stanbol NER. Can we
> > > tweak
> > > >> >>>> OpenNLP-NER component for this?
> > > >> >>>>
> > > >> >>>> Any ideas/pointers on how to refine these enhancement results
> > will
> > > be
> > > >> >>>> immensely helpful.
> > > >> >>>> I'm looking for a way to improve the accuracy of the results as
> > > much
> > > >> as
> > > >> >>>> possible.
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Dileepa
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> | Rupert Westenthaler             rupert.westentha...@gmail.com
> > > >> | Bodenlehenstraße 11                             ++43-699-11108907
> > > >> | A-5500 Bischofshofen
> > > >>
> > >
> > >
> > >
> > > --
> > > | Rupert Westenthaler             rupert.westentha...@gmail.com
> > > | Bodenlehenstraße 11                             ++43-699-11108907
> > > | A-5500 Bischofshofen
> > >
> >
>

Re: How to improve NER results in Stanbol

Reply via email to