Hi Cristian, Thanks a lot, yes the server is up and running and I can access the server main page in the browser too. I was expecting a log entry on server startup indication...sorry for the false alarm guys.
BTW I think it will be good to give a log entry on successful server startup completion. Thanks, Dileepa On Fri, Nov 29, 2013 at 8:46 PM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Hi Dileepa, > > I've played with the Stanbol Stanford NLP project for a little while. From > what I saw that is normal output, it means the server is up and running. > > If you issue a command like "curl -X POST -H "Content-Type: text/plain" -H > "Content-Language: en" --data "[YOUR TEXT]" http://localhost: > [PORT]/analysis" > , replacing [YOUR_TEXT] and [PORT] accordingly you should see it work > giving you a json output. > > Regards, > Cristian > > > 2013/11/29 Dileepa Jayakody <dileepajayak...@gmail.com> > > > Hi Rupert, > > > > Thanks again for your suggestions. > > I cloned and build the stanbol-stanfordnlp project above and executed the > > run command [1] as below in a separate directory. But the server startup > > doesn't complete..it hangs at a point with the log entry : "Reading > > TokensRegex rules from > > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt" > > > > Any ideas? Can I edit the configurations to skip the above TokenRegex > rules > > and start the server? > > > > Thanks, > > Dileepa > > > > [1] > > dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java > -Xmx1g > > -jar > > > > > at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar* > > Loading default properties from tagger > > > > > edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger > > Reading POS tagger model from > > > > > edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger > > ... done [2.2 sec]. > > Loading classifier from > > edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... > done > > [6.1 sec]. > > Loading classifier from > > edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... > > done [4.3 sec]. > > Loading classifier from > > edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... > done > > [3.9 sec]. > > Initialization JollyDayHoliday for sutime > > Reading TokensRegex rules from > > edu/stanford/nlp/models/sutime/defs.sutime.txt > > Reading TokensRegex rules from > > edu/stanford/nlp/models/sutime/english.sutime.txt > > Nov 29, 2013 7:14:24 PM > > edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules > > INFO: Ignoring inactive rule: temporal-composite-8:ranges > > Reading TokensRegex rules from > > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt > > > > > > > > On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler < > > rupert.westentha...@gmail.com> wrote: > > > > > Hi Dileepa > > > > > > If you require to detect Entities that are not part of the Controlled > > > Vocabularies than there is no way around NER. If you want to have good > > > results there will be no way around of building your own models based > > > on a custom trainings set. > > > > > > If you need to detect Persons, Organizations and Places you might have > > > a look at Stanford NLP with the Stanbol integration [1]. As the > > > Stanford Model provided by Stanford NLP is much better as such of > > > OpenNLP. > > > > > > best > > > Rupert > > > > > > > > > [1] https://github.com/westei/stanbol-stanfordnlp > > > > > > On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody > > > <dileepajayak...@gmail.com> wrote: > > > > Hi Rafa, Rupert, > > > > > > > > Thanks a lot for your input. I will look at the options you have > > > suggested. > > > > However, in the first phase of my project I don't require > > entity-linking > > > > from entity-hub because many of the entities mentioned in the > content I > > > > submit will not be available in dbpedia. Therefore currently I also > > don't > > > > require dbpediaLinking, entityhubExtraction engines in the default > > chain > > > > I'm using. I will look at implementing a custom-vocab in the second > > phase > > > > of the project for entity-linking and disambiguation purpose. > > > > > > > > At the moment, I focus on improving the accuracy of > > > > named-entity-recognition using NLP techniques. So I think > > opennlp-chunker > > > > based improvements will be very helpful at this point. > > > > > > > > Do you think the accuracy of NER will be improved if I also associate > > > > entitylinking with dbpedia, dbpedia-fst-linking? > > > > > > > > Thanks, > > > > Dileepa > > > > > > > > > > > > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler < > > > > rupert.westentha...@gmail.com> wrote: > > > > > > > >> Hi Dileepa, > > > >> > > > >> I would suggest you also test with a chain that uses Entity Linking > > > >> instead of Named Entity Linking. Have you tried the > > > >> "dbpedia-fst-linking" chain? This one is also configured in the > > > >> default launcher. Please also have a look at STANBOL-1211 [1] that > > > >> brought a lot of improvements for EntityLinking if you include a > > > >> chunker (e.g. the opennlp-chunker) in your chain. > > > >> > > > >> best > > > >> Rupert > > > >> > > > >> > > > >> [1] https://issues.apache.org/jira/browse/STANBOL-1211 > > > >> > > > >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody > > > >> <dileepajayak...@gmail.com> wrote: > > > >> > Hi Rafa, > > > >> > > > > >> > I'm using the default chain; > > > >> > tika > > > >> > langdetect > > > >> > opennlp-sentence > > > >> > opennlp-token > > > >> > opennlp-pos > > > >> > opennlp-ner > > > >> > dbpediaLinking > > > >> > entityhubExtraction > > > >> > > > > >> > Thanks, > > > >> > Dileepa > > > >> > > > > >> > > > > >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> > > wrote: > > > >> > > > > >> >> Hi Dileepa, > > > >> >> > > > >> >> Are you using only OpenNLP NER engine or are you also including > an > > > >> Entity > > > >> >> Linking engine? > > > >> >> > > > >> >> > > > >> >> El 27/11/13 11:17, Dileepa Jayakody escribió: > > > >> >> > > > >> >>> Content: > > > >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief > > > >> Operations > > > >> >>> and Technology Officer. He will join the Executive Committee of > > > >> Barclays > > > >> >>> and report directly to Group Chief Executive Antony Jenkins. > > > >> >>> > > > >> >>> Above content doesn't identify* Barclays* as an organization by > > > >> >>> identifies *Executive > > > >> >>> Committee of Barclays* as an organization. > > > >> >>> > > > >> >>> > > > >> >>> How can we improve the accuracy of these results? > > > >> >>> > > > >> >>> Thanks, > > > >> >>> Dileepa > > > >> >>> > > > >> >>> > > > >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody < > > > >> >>> dileepajayak...@gmail.com > > > >> >>> > > > >> >>>> wrote: > > > >> >>>> [Typo corrected in the subject of the mail] > > > >> >>>> ---------- Forwarded message ---------- > > > >> >>>> From: Dileepa Jayakody <dileepajayak...@gmail.com> > > > >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM > > > >> >>>> Subject: How to refinin NER results in Stanbol > > > >> >>>> To: Stanbol Dev List <dev@stanbol.apache.org> > > > >> >>>> > > > >> >>>> > > > >> >>>> Hi All, > > > >> >>>> > > > >> >>>> I have been running some load tests on Stanbol entity > > recognition, > > > >> with a > > > >> >>>> high load of content extracted from web articles and stored in > a > > > Solr > > > >> >>>> index. > > > >> >>>> > > > >> >>>> My objective is to achieve an efficient and accurate > enhancement > > > >> result > > > >> >>>> for the content submitted. > > > >> >>>> > > > >> >>>> But I think some of the NER results obtained are not accurate. > > > >> >>>> > > > >> >>>> For an example I submit the content : > > > >> >>>> Group Finance Director Chris Lucas and Group General Counsel > Mark > > > >> Harding > > > >> >>>> to retire from Barclays > > > >> >>>> > > > >> >>>> I get below entity recognition results from default > > > enhancement-chain; > > > >> >>>> > > > >> >>>> People : Chris Lucas, Mark Harding > > > >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris > Lucas > > > and > > > >> >>>> Group General Counsel* > > > >> >>>> > > > >> >>>> > > > >> >>>> The highlighted NERs for organizations above are inaccurate > > > results. > > > >> >>>> BT Group is not mentioned in the content, and the result : > > *Finance > > > >> >>>> Director Chris Lucas and Group General Counsel * is not an > > > >> organization, > > > >> >>>> > > > >> >>>> rather a phrase. > > > >> >>>> Further if I add a fullstop (.) to the end of the sentence > > > "Barclays" > > > >> is > > > >> >>>> not recognized as an Organization. > > > >> >>>> > > > >> >>>> I think we need to improve these results in Stanbol NER. Can we > > > tweak > > > >> >>>> OpenNLP-NER component for this? > > > >> >>>> > > > >> >>>> Any ideas/pointers on how to refine these enhancement results > > will > > > be > > > >> >>>> immensely helpful. > > > >> >>>> I'm looking for a way to improve the accuracy of the results as > > > much > > > >> as > > > >> >>>> possible. > > > >> >>>> > > > >> >>>> Thanks, > > > >> >>>> Dileepa > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >> > > > >> > > > >> > > > >> > > > >> -- > > > >> | Rupert Westenthaler rupert.westentha...@gmail.com > > > >> | Bodenlehenstraße 11 ++43-699-11108907 > > > >> | A-5500 Bischofshofen > > > >> > > > > > > > > > > > > -- > > > | Rupert Westenthaler rupert.westentha...@gmail.com > > > | Bodenlehenstraße 11 ++43-699-11108907 > > > | A-5500 Bischofshofen > > > > > >