Re: problem with entity recognition or linking in french

Rupert Westenthaler Wed, 17 Apr 2013 21:24:12 -0700

Hi Joseph,

What engines do you use for NLP processing of French texts? OpenNLP
has no models for French, so if you just configure those engines you
will have tokens, but no detected Sentences, POS tags nor NER
annotations. In this case the EntityhubLinkingEngine falls back to
linking all Tokens of the Text that do have >= "Min Search Token
Length" (default = 3) with the text. So assuming that your
configuration of the EnhancementChain is like described "plombier" and
"moustachu" should be linked with the vocabulary.

BTW: If you are interested in processing French texts with Stanbol you
should consider to use the Stanbol Talismane integration [1]

Problems can also arise with very short texts (1) because the language
might not be correctly detected and (2) POS and NER annotations do not
work very well in such scenarios. So please check what language was
detected for your input. If it was one of classified as one of the
supported one (e.g. pt) you might also get unexpected results.

Regarding the matching of skos:altLabel: The EntityhubLinkingEngine
links only to a single field. By default this is set to rdfs:label. If
you want to match against both skos:prefLabel and skos:altLabel, than
there are two possibilities (1) copy the values of both skos:prefLabel
and skos:altLabel to rdfs:label and configure rdfs:label for the
engine (2) configure two instances of the EnityhubLinkingEngine: one
for skos:prefLabel and the other for the skos:altLabel.

If you want to know what happens ...

(1) you can configure a Logger configuration to set the logger level
for "org.apache.stanbol.enhancer.engines.entitylinking" to DEBUG. For
that go the the "Configuration" tab of the Felix Web Console and add a
new "Apache Sling Logging Logger Configuration". In DEBUG level the
detailed information about the linking process are printed to the log.

(2) if you want detailed information about the NLP processing results
to be added to the enhancement results you can add the nlp2rdf
enhancement engine to your Stanbol instance and your enhancement
chain. For that you first need to install the bundle of this engine to
the Stanbol environment (e.g. by using the Bundles tab of the Felix
Webconsole) and after that add the engine to your chain configuration.
This Engine does write detailed information about the NLP processing
results. You can test it on [2]

On Wed, Apr 17, 2013 at 4:16 PM, Joseph M'Bimbi-Bene
<[email protected]> wrote:
> Also, why is "le plombier moustachu" recognized ? why is there a difference
> ?

No Idea. Maybe the detected language does change by adding a word.

>
> Another related question is: what is the pos type of a token when i
> deactivate the POStagging ?

Than there are simple no POS annotations and the length of the words
is used to decide if they are linked or not. Note that regardless of
that upper case words do always trigger searches in the linked
vocabulary.

> Are they all proper noun ? what happens ? how can i parameter that ?

The EntityLinking engine distinguishes

* Linkable Tokens: This are words that are linked with the Vocabulary.
This means that the engine will issue quires in the controlled
vocabulary for those tokens
* Matchable Tokens: Matchable tokens are used to refine quires. For
the matching of entity labels with the text those words are treated in
the same way as linkable words. So the main difference is that
matchable words alone will not cause the engine to query for Entities
in the Controlled Vocabulary.
* Other Tokens: All other tokens in the text are not used for searches
in the configured vocabulary. However during the matching of labels
with the Text they are considered as they might also be present in
labels of entities

The rules for classifying words as Linkable and Matchable can be
controlled by the configuration of the EntiyLinkingEngine. You can
find details about that in the documentation at [3]

best
Rupert

[1] https://github.com/westei/stanbol-talismane
[2] http://dev.iks-project.eu:8081/enhancer/chain/NIF-demo
[3]
http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#linking-process

--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen

Re: problem with entity recognition or linking in french

Reply via email to