Hello, i'm using Stanbol for Named Entity Recognition tasks with custom vocabularies. I am having problems with labels on multiple tokens.
For example, i have this entity: <rdf:Description rdf:about="http://example.org/resource#Mario"> <skos:prefLabel>Mario</skos: prefLabel> <skos:altLabel>le plombier moustachu</skos:altLabel> <rdf:type>http://example.org/concept#gentil</rdf:type> <rdf:type>http://example.org/concept#humain</rdf:type> </rdf:Description> I only activated the open-nlp tokenizer in the enchancement chain with default configurations and left pretty much any other parameters unchanged. I try to enhance the simple following text: "plombier moustachu" and the entity is not recognized. But when i submit "le plombier moustachu", my entity is recognized. >From what i understood, the altLabel is also tokenized during the lookup process, so the "le" token should be left aside and i should have a perfect match between "plombier moustachu" in the text and the label "le plombier moustachu". Also, why is "le plombier moustachu" recognized ? why is there a difference ? Another related question is: what is the pos type of a token when i deactivate the POStagging ? Are they all proper noun ? what happens ? how can i parameter that ? Thank you, have a nice day
