Hi David,

Thank your for your answer.

So I understand that this must be an issue with the stemmer/FSA creation.
The question is there something I can do to solve the issue ?

Thanks again a lot,

Côme



2014-05-20 15:11 GMT+02:00 David Przybilla <[email protected]>:

> Hi Alex and Come,
>
> Im checking your cases. (see below for the french cases)
>
> so getting some stats on your sample:
>
> Annotated count of SF:
> 4
> Total counts of SF:
> 55
> Annotation probability
> 0.07272727272727272
>
> ---------------Candidates------------------------------------
> http://dbpedia.org/resource/Eidetic_memory
> http://dbpedia.org/resource/Recall_(memory)
>
> The surface form is in the main Surface Form storage
>
> this sample is not spottable because of its probability, and yeah, the
> movie is not in the candidates.
>
> ==================
>
> So I think this problem is different from the issue  you previously posted
> on github.
>
> So the FSA definitely seems to improve lowercases handling, simply cause
> the fsa is built on all the surface forms which are in the main store.
> Which means that all the lowercases forms of things in the surfaceform
> store are in principle spottable. However There are some filters .
>
> There are two Surface form storages:  the main  storage ( where uppercases
> are supposed to be stored) and the lowercase storage
>
> The main one is supposed to store uppercases surfaceforms, however as far
> as I understand it also stores lowercases which explicitly were annotated
> in the data used to generate the models. So in your case there was a "total
> recall" annotation with the candidate topics you see from the output. Thats
> why it lives in the main storage.
>
> The lowercase storage is meant to store artificial surface forms created
> from making lowercases of things in the main Storage.
>
> When the spotter is called, it first checks the main storage and only if
> nothing can be found there it will check the lowercase storage.
> So in your case, because "total recall" exists in the main, its never
> going into the second.
>
> Another issue I know of (which can affect also spotting lowercases) is the
> discount mechanism, there is an open issue about it. It affects surface
> forms which are subparts of others for example: "apple" is a SF, but it is
>  also a substring of another SF: "Apple macbook pro"
>
> ======
>
> Checking  on the french case: "*Hétérozygote*"
> there is a "h*étérozygote*"  with annotationprobability of 0.67 in the
> main surfaceform storage.
>
> My guess here is that there is an issue with the FSA, since the FSA should
> match `h*étérozygote *` as a candidate  surfaceform and then the spotter
> should find it in the uppercase storage.
>
> I would say that most likely it is an issue with the stemmer/FSA creation.
>
>
>
> On Tue, May 20, 2014 at 8:44 AM, David Przybilla 
> <[email protected]>wrote:
>
>> Hi Alex,
>>
>> I'll check the case you mentioned ("total recall")
>>
>> In my experience the problems are not longer with teh FSA but with other
>> bits. For example the discounting mechanism (I.e "google", " apple" are not
>> longer spottable
>> Am 19.05.2014 19:00 schrieb "Alex Olieman" <[email protected]>:
>>
>> I've been struggling with the same issue for over a week now, and think
>>> there is cause to re-open the issue on github.
>>> As I understand from the github discussion, there should only be a
>>> problem with the OpenNLPSpotter. However, using the FSASpotter I see
>>> exactly the same problem. Adding lowercase surface forms for resources also
>>> doesn't seem to change much.
>>>
>>> Has something changed since the issue was closed on github?
>>>
>>> A nice test case is "total recall movie": it has a completely different
>>> set of candidates than "Total Recall movie".
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>> On Thu, May 15, 2014 at 5:23 PM, Sang Venkatraman <
>>> [email protected]> wrote:
>>>
>>>> Hi -- An issue was created for this (and has been closed). I have not
>>>> had a chance to test this stuff recently but the comments should give you
>>>> some idea.
>>>>
>>>> https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/196
>>>>
>>>> Thanks,
>>>> Sang
>>>>
>>>>
>>>> On Thu, May 15, 2014 at 9:19 AM, Côme SAUVAL 
>>>> <[email protected]>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm currently trying to install and run dbpedia-spotlight on my own
>>>>> server.
>>>>> I have followed the "build source from Maven" tutorial and it's
>>>>> working almost fine, I only have one problem left.
>>>>>
>>>>> Here's an example of the problem :
>>>>>
>>>>> The text where I want to spot some words is : "Quel est le lien entre
>>>>> toux, déficit hétérozygote en alpha1 anti trypsine chez un patient porteur
>>>>> de RCUH".
>>>>>
>>>>> When I run it on the demo or on the http://spotlight.sztaki.hu:2225/ 
>>>>> server,
>>>>> I get 2 results : *hétérozygote* and *trypsine*
>>>>>
>>>>> But when I use my own server, I only get those results *if the words
>>>>> are capFirst* : "Quel est le lien entre toux, déficit *Hétérozygote*en 
>>>>> alpha1 anti
>>>>> *Trypsine* chez un patient porteur de RCUH"
>>>>> In this case I get *Hétérozygote* and *Trypsine *but if I let in in
>>>>> lower case I get no results.
>>>>>
>>>>> Does anyone has had the same issue ?
>>>>>
>>>>> How can I configure the server to resolve this issue ?
>>>>>
>>>>> Thanks a lot for your help,
>>>>>
>>>>> Côme
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>>>> Get unparalleled scalability from the best Selenium testing platform
>>>>> available
>>>>> Simple to use. Nothing to install. Get started now for free."
>>>>> http://p.sf.net/sfu/SauceLabs
>>>>> _______________________________________________
>>>>> Dbp-spotlight-users mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>>> Get unparalleled scalability from the best Selenium testing platform
>>>> available
>>>> Simple to use. Nothing to install. Get started now for free."
>>>> http://p.sf.net/sfu/SauceLabs
>>>> _______________________________________________
>>>> Dbp-spotlight-users mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>> Get unparalleled scalability from the best Selenium testing platform
>>> available
>>> Simple to use. Nothing to install. Get started now for free."
>>> http://p.sf.net/sfu/SauceLabs
>>> _______________________________________________
>>> Dbp-spotlight-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>>
>>>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
>
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to