Re: [Dbp-spotlight-users] Time performance for each phase

Pajolma Rupi Thu, 04 Jun 2015 08:32:13 -0700

Hi David, 

I managed to find the kore50 corpus but not the milne-witten one. Do you know 
if it's still publicly available?


In order to test the time performance of each phase, I was thinking to use the 
available endpoints: 

1-spot 
2-candidates 
3-disambiguate 
4-annotate 

Because for using the disambiguate endpoint I would have to provide NE 
annotations in my call I was thinking to use the annotate endpoint instead and 
subtract the time consumed by the candidates endpoint in order to be able to 
get the time consumed by the disambiguation phase. Would such logic be correct 
with respect to the implementation? Is there any other phase in the pipeline 
(between disambiguation and annotation) which might affect this logic? If I 
understood it well, the pipeline consists of the processing done by each of the 
endpoints in the order that I've listed them above. Please let me know if it is 
not the case. 

Thank you in advance, 
Pajolma 

----- Original Message -----


From: "David Przybilla" <[email protected]> 
To: "Pajolma Rupi" <[email protected]> 
Cc: [email protected] 
Sent: Tuesday, June 2, 2015 6:45:19 PM 
Subject: Re: [Dbp-spotlight-users] Time performance for each phase 

Hi Pajolma, 

As far as I know there are no separate evaluations out of the box, but you 
could use the milne-witten corpus to evaluate only the spottter and 
disambiguation separately. 

In my experience problems are usually related to spotting: surface forms which 
are not in the models, surface forms without enough probability. 

There is also specific corpus for evaluating disambiguation (kore50) 


On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi < [email protected] > wrote: 

<blockquote>

Dear all, 

I was not able to find some information regarding the time performance of 
Spotlight service for each of the phases (separately): phrase spotting 
(candidate generation, candidate selection), disambiguation, indexing.There are 
some numbers present in the paper " Improving efficiency and accuracy in 
multilingual entity extraction " but they are calculated in the context of all 
the annotation process, meanwhile I'm interested in knowing during which 
specific phase the service performs better and during which phase it performs 
worse. 

Could you please let me know if such information exists already? 
I would also be interested in knowing if I can produce such information by 
running my own local instance of Spotlight (I'm using Java in order to annotate 
text). 

Thank you in advance, 
Pajolma 

------------------------------------------------------------------------------ 

_______________________________________________ 
Dbp-spotlight-users mailing list 
[email protected] 
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users 






</blockquote>

------------------------------------------------------------------------------

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] Time performance for each phase

Reply via email to