Hi Pajolma, Taking a better look it could be that the easiest would be to measure the time in the rest module.
so if you take a look at [1] you can see all the classes which matches the endpoints. I think if you just wrap one of the methods with the code measuring the time, and make the right request to spotlight depending on the method that you wrapped (i.e: post, get, html, xml..etc) then It would be straightforward :) [1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/tree/master/rest/src/main/java/org/dbpedia/spotlight/web/rest/resources On Thu, Jun 11, 2015 at 9:23 AM, Pajolma Rupi <[email protected]> wrote: > Hello David, > Can I ask you for a few more hints on how to isolate the calls for > `candidates` , `annotate` and `spot` while using the SpotlightInterface? > > Here is what I tried (still using the jar file via JAVA): > > SpotlightInterface si=new SpotlightInterface("annotate/"); > String text="Sherlcok Holmes plot took place in 1891, in London."; > String inUrl="http://localhost:2222/rest/annotate/"; > double confidence=0.0; > int support=10; > String dbpediaTypesString=""; > String sparqlQuery=""; > String policy="whitelist"; > boolean coreferenceResolution=true; > String clientIp=""; > String spotterName="CoOccurrenceBasedSelector"; > String disambiguator="DefaultDisambiguator"; > > String result=si.getJSON(text, inUrl, confidence, support, > dbpediaTypesString, sparqlQuery, policy, coreferenceResolution, clientIp, > spotterName, disambiguator); > > > But I get the following error: > *Exception in thread "main" org.dbpedia.spotlight.**exceptions.InputException: > No spotters were loaded. Please add one of []....* > > Could you please let me know which is the right spotter and disambiguator > (I'm trying to guess the next problem I might have with the diambiguator > name :) ) I should use in order to get the correct results? > > @Alex > Thanks a lot for your comments! My concern is that I'm not sure the > 'candidates' endpoint and the 'disambiguate' one are totally separated > between each other in the sense that I've the impression that a part of the > disambiguation logic might be already performed during the candidates > generation (contextual score calculation) which makes me doubt about the > significance of just comparing the different endpoints time performance... > Let me know if you see it differently. > You're right, I am using the statistical version but the paper you're > pointing ("Improving efficiency and accuracy in multilingual entity > extraction") isn't giving me enough helpful information. > > Thank you in advance, > Pajolma > > ------------------------------ > > *From: *"Alex Olieman" <[email protected]> > *To: *[email protected] > *Sent: *Tuesday, June 9, 2015 12:51:41 PM > > *Subject: *Re: [Dbp-spotlight-users] Time performance for each phase > > Hi Pajolma, > > You may want to adjust what you measure. Both the annotate and candidate > endpoints encompass spotting, so it is entirely expected that > spot+candidates takes longer than only annotate. In the (old) IR-based > implementation (from the paper you cite) you may be able to make sense of > this by comparing the timings of spot+disambiguate with annotate. Their > total time of completion should be (roughly) equivalent if I understand > correctly. I'm not sure if this is the same for the newer statistical > version, but it can at least be verified this way. > > The difference between annotate and candidates is merely that annotate > selects the candidate with the highest disambiguation score that passes the > confidence threshold. That should make for an insignificant difference in > runtime. Someone please correct me if I'm wrong. > > David's suggestion to use SpotlightInterface to measure the timings of the > various pipelines seems the way to go if you want to do these measurements > cleanly from your Java code. I'm not a Java dev, however, so for API usage > tips many of the other subscribers to this list would have a better chance > at helping with that. > > For more background on how the faster statistical implementation (which > you are most likely using) builds its models, and what it needs to do at > runtime for spotting and disambiguation, please see Joachim Daiber et al. > "Improving efficiency and accuracy in multilingual entity extraction". I've > also written about this for the ERD'14 challenge: > http://www.e.humanities.uva.nl/publications/2014/olie:enti14.pdf > > There is at least a relevant difference between language-independent and > language-dependent spotting, which is configurable. The first, > lexicon-based Aho-Corasick spotting, should be significantly faster than > OpenNLP spotting. Intuitively, I would say that disambiguation should take > longer than spotting, but Jo et al. did an exceptional job at fitting the > models in memory and speeding this up. So, I'm also very interested in what > you will discover! > > Best of luck, > > Alex > > On 9-6-2015 9:57, Pajolma Rupi wrote: > > Hi David, > > Yes, my objective was to test the running time for each endpoint, so that > I have an idea about the phase that takes longer during the annotation > process. > I ran a few tests with small text files and it seems like the phrase > spotting phase (spot endpoint + candidates endpoint) takes longer in > comparison to the disambiguation one (annotate endpoint). My explanation > would be that during the disambiguation phase, it's only the contextual > score that is taken into account (if I understood it right from the paper > *DBpedia > spotlight shedding light on the web of documents : *the resource with the > biggest contextual score is chosen) and this score is already calculated > during the phrase spotting (more precisely during the candidate generation > sub-phase). Given this fact, the disambiguation consists of just choosing > the resource with the biggest contextual score and takes much less time > than the phrase spotting one. Please let me know if you have a different > opinion on the matter. > > Best, > Pajolma > > > ------------------------------ > > *From: *"David Przybilla" <[email protected]> > <[email protected]> > *To: *"Pajolma Rupi" <[email protected]> <[email protected]> > *Cc: *[email protected] > *Sent: *Friday, June 5, 2015 10:19:07 AM > *Subject: *Re: [Dbp-spotlight-users] Time performance for each phase > > Hi Pajolma, > > Sorry, I miss understood "performance" :) and Ithought we were talking > about the quality of the extractions. > > If it is benchmarking time, then I guess yes, you could call the given > endpoints and subtract the time. > > Other possibility is for you take a look at SpotlightInterface which > encode all the pipelines for `candidates` , `annotate` and `spot`, then > isolate the calls, passing some testing set that you could provide. > > > > > On Thu, Jun 4, 2015 at 4:30 PM, Pajolma Rupi <[email protected]> > wrote: > >> Hi David, >> >> I managed to find the kore50 corpus but not the milne-witten one. Do you >> know if it's still publicly available? >> >> In order to test the time performance of each phase, I was thinking to >> use the available endpoints: >> >> 1-spot >> 2-candidates >> 3-disambiguate >> 4-annotate >> >> Because for using the *disambiguate* endpoint I would have to provide NE >> annotations in my call I was thinking to use the *annotate* endpoint >> instead and subtract the time consumed by the *candidates* endpoint in >> order to be able to get the time consumed by the disambiguation phase. >> Would such logic be correct with respect to the implementation? Is there >> any other phase in the pipeline (between disambiguation and annotation) >> which might affect this logic? If I understood it well, the pipeline >> consists of the processing done by each of the endpoints in the order that >> I've listed them above. Please let me know if it is not the case. >> >> Thank you in advance, >> Pajolma >> >> ------------------------------ >> >> *From: *"David Przybilla" <[email protected]> >> *To: *"Pajolma Rupi" <[email protected]> >> *Cc: *[email protected] >> *Sent: *Tuesday, June 2, 2015 6:45:19 PM >> *Subject: *Re: [Dbp-spotlight-users] Time performance for each phase >> >> >> Hi Pajolma, >> >> As far as I know there are no separate evaluations out of the box, but >> you could use the milne-witten corpus to evaluate only the spottter and >> disambiguation separately. >> >> In my experience problems are usually related to spotting: surface forms >> which are not in the models, surface forms without enough probability. >> >> There is also specific corpus for evaluating disambiguation (kore50) >> >> >> >> On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi <[email protected]> >> wrote: >> >>> Dear all, >>> >>> I was not able to find some information regarding the time performance >>> of Spotlight service for each of the phases (separately): phrase spotting >>> (candidate generation, candidate selection), disambiguation, indexing.There >>> are some numbers present in the paper "*Improving efficiency and >>> accuracy in multilingual entity extraction*" but they are calculated in >>> the context of all the annotation process, meanwhile I'm interested in >>> knowing during which specific phase the service performs better and during >>> which phase it performs worse. >>> >>> Could you please let me know if such information exists already? >>> I would also be interested in knowing if I can produce such information >>> by running my own local instance of Spotlight (I'm using Java in order to >>> annotate text). >>> >>> Thank you in advance, >>> Pajolma >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Dbp-spotlight-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users >>> >>> >> >> > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Dbp-spotlight-users mailing > [email protected]https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Dbp-spotlight-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users > > >
------------------------------------------------------------------------------
_______________________________________________ Dbp-spotlight-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
