Hello, i'm using Stanbol for Named Entity Recognition tasks with custom
vocabularies. I am having problems with labels on multiple tokens.

For example, i have this entity:

<rdf:Description rdf:about="http://example.org/resource#Mario";>
        <skos:prefLabel>Mario</skos:
prefLabel>
        <skos:altLabel>le plombier moustachu</skos:altLabel>
        <rdf:type>http://example.org/concept#gentil</rdf:type>
        <rdf:type>http://example.org/concept#humain</rdf:type>
</rdf:Description>

I only activated the open-nlp tokenizer in the enchancement chain with
default configurations and left pretty much any other parameters unchanged.

I try to enhance the simple following text: "plombier moustachu" and the
entity is not recognized. But when i submit "le plombier moustachu", my
entity is recognized.

>From what i understood, the altLabel is also tokenized during the lookup
process, so the "le" token should be left aside and i should have a perfect
match between "plombier moustachu" in the text and the label "le plombier
moustachu".

Also, why is "le plombier moustachu" recognized ? why is there a difference
?

Another related question is: what is the pos type of a token when i
deactivate the POStagging ?
 Are they all proper noun ? what happens ? how can i parameter that ?

Thank you, have a nice day

Reply via email to