Hi Pajolma,

Sorry, I miss understood "performance" :) and  Ithought we were talking
about the quality of the extractions.

If it is benchmarking time, then I guess yes, you could call the given
endpoints and subtract the time.

Other possibility is for you take a look at SpotlightInterface which encode
all the pipelines for `candidates` , `annotate` and `spot`, then isolate
the calls, passing some testing set that you could provide.




On Thu, Jun 4, 2015 at 4:30 PM, Pajolma Rupi <[email protected]> wrote:

> Hi David,
>
> I managed to find the kore50 corpus but not the milne-witten one. Do you
> know if it's still publicly available?
>
> In order to test the time performance of each phase, I was thinking to use
> the available endpoints:
>
> 1-spot
> 2-candidates
> 3-disambiguate
> 4-annotate
>
> Because for using the *disambiguate* endpoint I would have to provide NE
> annotations in my call I was thinking to use the *annotate* endpoint
> instead and subtract the time consumed by the *candidates* endpoint in
> order to be able to get the time consumed by the disambiguation phase.
> Would such logic be correct with respect to the implementation? Is there
> any other phase in the pipeline (between disambiguation and annotation)
> which might affect this logic? If I understood it well, the pipeline
> consists of the processing done by each of the endpoints in the order that
> I've listed them above. Please let me know if it is not the case.
>
> Thank you in advance,
> Pajolma
>
> ------------------------------
>
> *From: *"David Przybilla" <[email protected]>
> *To: *"Pajolma Rupi" <[email protected]>
> *Cc: *[email protected]
> *Sent: *Tuesday, June 2, 2015 6:45:19 PM
> *Subject: *Re: [Dbp-spotlight-users] Time performance for each phase
>
>
> Hi Pajolma,
>
> As far as I know there are no separate evaluations out of the box, but you
> could use the milne-witten corpus to evaluate only the spottter and
> disambiguation separately.
>
> In my experience problems are usually related to spotting: surface forms
> which are not in the models, surface forms without enough probability.
>
> There is also specific corpus for evaluating disambiguation (kore50)
>
>
>
> On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi <[email protected]>
> wrote:
>
>> Dear all,
>>
>> I was not able to find some information regarding the time performance of
>> Spotlight service for each of the phases (separately): phrase spotting
>> (candidate generation, candidate selection), disambiguation, indexing.There
>> are some numbers present in the paper "*Improving efficiency and
>> accuracy in multilingual entity extraction*" but they are calculated in
>> the context of all the annotation process, meanwhile I'm interested in
>> knowing during which specific phase the service performs better and during
>> which phase it performs worse.
>>
>> Could you please let me know if such information exists already?
>> I would also be interested in knowing if I can produce such information
>> by running my own local instance of Spotlight (I'm using Java in order to
>> annotate text).
>>
>> Thank you in advance,
>> Pajolma
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>>
>
>
------------------------------------------------------------------------------
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to