Hi Pajolma, Sorry, I miss understood "performance" :) and Ithought we were talking about the quality of the extractions.
If it is benchmarking time, then I guess yes, you could call the given endpoints and subtract the time. Other possibility is for you take a look at SpotlightInterface which encode all the pipelines for `candidates` , `annotate` and `spot`, then isolate the calls, passing some testing set that you could provide. On Thu, Jun 4, 2015 at 4:30 PM, Pajolma Rupi <[email protected]> wrote: > Hi David, > > I managed to find the kore50 corpus but not the milne-witten one. Do you > know if it's still publicly available? > > In order to test the time performance of each phase, I was thinking to use > the available endpoints: > > 1-spot > 2-candidates > 3-disambiguate > 4-annotate > > Because for using the *disambiguate* endpoint I would have to provide NE > annotations in my call I was thinking to use the *annotate* endpoint > instead and subtract the time consumed by the *candidates* endpoint in > order to be able to get the time consumed by the disambiguation phase. > Would such logic be correct with respect to the implementation? Is there > any other phase in the pipeline (between disambiguation and annotation) > which might affect this logic? If I understood it well, the pipeline > consists of the processing done by each of the endpoints in the order that > I've listed them above. Please let me know if it is not the case. > > Thank you in advance, > Pajolma > > ------------------------------ > > *From: *"David Przybilla" <[email protected]> > *To: *"Pajolma Rupi" <[email protected]> > *Cc: *[email protected] > *Sent: *Tuesday, June 2, 2015 6:45:19 PM > *Subject: *Re: [Dbp-spotlight-users] Time performance for each phase > > > Hi Pajolma, > > As far as I know there are no separate evaluations out of the box, but you > could use the milne-witten corpus to evaluate only the spottter and > disambiguation separately. > > In my experience problems are usually related to spotting: surface forms > which are not in the models, surface forms without enough probability. > > There is also specific corpus for evaluating disambiguation (kore50) > > > > On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi <[email protected]> > wrote: > >> Dear all, >> >> I was not able to find some information regarding the time performance of >> Spotlight service for each of the phases (separately): phrase spotting >> (candidate generation, candidate selection), disambiguation, indexing.There >> are some numbers present in the paper "*Improving efficiency and >> accuracy in multilingual entity extraction*" but they are calculated in >> the context of all the annotation process, meanwhile I'm interested in >> knowing during which specific phase the service performs better and during >> which phase it performs worse. >> >> Could you please let me know if such information exists already? >> I would also be interested in knowing if I can produce such information >> by running my own local instance of Spotlight (I'm using Java in order to >> annotate text). >> >> Thank you in advance, >> Pajolma >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Dbp-spotlight-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users >> >> > >
------------------------------------------------------------------------------
_______________________________________________ Dbp-spotlight-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
