Hi Pablo,

Thinking more deeply in this....if LingPipeSpotter is using Exact Dictionary Based chunking, how is possible that a stopword was spotted just being part of a surface form?? As far as I know, LingPipe's dictionary implementation is based on exact matching.

Regards

El 15/12/12 17:39, Pablo N. Mendes escribió:


Hi Rafa,
The part that is perhaps confusing here is that the stopword list is used in multiple places. The SpanishAnalyzer removes them from the context index (used in disambiguation). What you report is that you see stopwords being spotted, which is a problem with your spotter dictionary (and the class that created it) or the spotter implementation.

Try this:
1) check if your *indexing.es.properties* configuration is pointing to the right stopwords file for spanish. If yes, check if that file contains the undesired words you see spotted. If no, that's your problem. 2) check if surfaceForms.tsv contain these spurious stopwords. If yes, then you need to double check what's happening in IndexLingPipeSpotter. Create a small surfaceForms.tsv and stopwords.txt and step through the code

Which spotter are you using? I am assuming it is LingPipeSpotter.

Cheers
pablo

On Dec 15, 2012 12:13 AM, "Rafa Haro" <[email protected] <mailto:[email protected]>> wrote:

    Hi all,

    I'm not sure if this is a bug, a problem with my local installation or
    an issue in the project. Testing our local installation in Spanish we
    are having problems with the list of stopwords. I'm almost sure
    that the
    list is being used properly during the indexing with Lucene's
    SpanishAnalyzer. But then, when we annotate a text in Spanish, some
    stopwords are selected as spotters and finally linked with a
    candidate.
    That is also happening sometimes with punctuation marks (dots,
    quotes....).

    Actually, I don't know if the system applies a stopwords removal
    process
    to the input text, but I was supposing that it should do it to prevent
    this behaviour. Am I right??

    Thanks. Regards
    This message should be regarded as confidential. If you have
    received this email in error please notify the sender and destroy
    it immediately. Statements of intent shall only become binding
    when confirmed in hard copy by an authorised signatory.

    Zaizi Ltd is registered in England and Wales with the registration
    number 6440931. The Registered Office is 222 Westbourne Studios,
    242 Acklam Road, London W10 5JJ, UK.


    
------------------------------------------------------------------------------
    LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
    Remotely access PCs and mobile devices and provide instant support
    Improve your efficiency, and focus on delivering more value-add
    services
    Discover what IT Professionals Know. Rescue delivers
    http://p.sf.net/sfu/logmein_12329d2d
    _______________________________________________
    Dbp-spotlight-users mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users



This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to