Hi Joseph:
The reason for your results is the "Min Label Score"
(enhancer.engines.linking.minLabelScore) parameter of the
EntityLinkingEngine.
Copied from [1]
* Min Label Score (enhancer.engines.linking.minLabelScore)
[0..1]::double: The "Label Score" [0..1] represents how much of the
Label of an Entity matches with the Text. It compares the number of
Tokens of the Label with the number of Tokens matched to the Text. Not
exact matches for Tokens, or if the Tokens within the label do appear
in an other order than in the text do also reduce this score. Entities
are only considered if at least one of their labels cores higher than
the minimum for all tree of Min Labe Score, Min Text Match Score and
Min Match Score.
The default value of this parameter is "0.75".
In your case where "cette plombier moustachu" is matched against "le
plombier moustachu" the actual label match score is only "0.667" (2/3
tokens of the label do match the text). Because of that the Entity is
not linked in that case.
If you would like to link Entities where two out of tree tokens match
with the text you should lower the configuration of minLabelScore to
values < "0.66" e.g.
enhancer.engines.linking.minLabelScore="0.55"
NOTE: As this property is not included in the configuration dialog of
config tab of the Felix Webconsole you will need to set it directly
via the config file of the engine instance. See [2] how to mange your
configuration within the 'stanbol/fileinstall' folder.
To create a configuration file for the EntityhubLinkingEngine you can
follow the following steps
1. To get a config file to start with just go look at
'stanbol/config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine'
and take the '{uid}.config' files of the engine you are currently
using.
2. Next you will need to name the file like
"org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-{configname}"
where {configname} should be a human readable name for your
configuration.
3. Now you can edit the file using a TextEditor:
* remove the "service.bundleLocation", "service.factoryPid" and
"service.pid" keys. Those are set by the OSGI environment and should
not be in the config
* add the configuration of the minLabelScore property
'enhancer.engines.linking.minLabelScore="0.55"'
* you can change/add other configuration parameters as described in [1]
4. Finally you need to (1) delete the current configuration of your
engine via the "config" tab of the Felix Webconsole and (2) copy your
configuration file to the 'stanbol/fileinstall' folder.
best
Rupert
[1]
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration
[2]
http://stanbol.staging.apache.org/docs/trunk/production-mode/partial-updates.html
On Thu, Apr 18, 2013 at 5:22 PM, Rupert Westenthaler
<[email protected]> wrote:
> On Thu, Apr 18, 2013 at 4:04 PM, Joseph M'Bimbi-Bene
> <[email protected]> wrote:
>> Thank you for your answer.
>>
>> But i misunderstood your indication. I mean, i thought i could specify a
>> specific word to be linkable or matchable.
>>
>> I have another question : how can i see the score when there is no match ?
>>
>
> If there is no match then there is no score.
>
> [..log..]
>> ?
>
> OK I can see your point. This is indeed a strange behavior. To be
> honest I have not tested much in settings without POS tags. So this
> might be as well a bug.
>
> I will try to reproduce this to have a detailed look what is going on.
>
> best
> Rupert
>
>>
>> I tried nlp2rdf, and in the resulting rdf, i cannot see it (maybe i missed
>> it though, there is so much information displayed, i am kinda lost)
>>
>>
>> 2013/4/18 Rupert Westenthaler <[email protected]>
>>
>>> On Thu, Apr 18, 2013 at 3:16 PM, Joseph M'Bimbi-Bene
>>> <[email protected]> wrote:
>>> > I don't see the option, can you give me the procedure or a more precise
>>> > indication please ?
>>> >
>>>
>>> If you do not want to use POS tagging, than the options are limited:
>>>
>>> * uc {NONE/MATCH/LINK}::string - the Upper Case Token Mode allows to
>>> configure how upper case words are treated. There are three possible
>>> modes: (1) NONE: defines that they are not specially treated; (2)
>>> MATCH defines that they are considered as matchable tokens
>>> (independent of the POS tag or the token length; (3) LINK: defines
>>> that they are in any case linked with the vocabulary. The default is
>>> "LINK" - as upper case words often represent named entities - with the
>>> exception of German ('de') where the mode is set to MATCH - as all
>>> Nouns in German are upper case.
>>>
>>> e.g.
>>>
>>>
>>> org.apache.stanbol.enhancer.engines.keywordextraction.processedLanguages=["fr;uc\=MATCH"]
>>> enhancer.engines.linking.minSearchTokenLength=3
>>>
>>> This would MATCH all upper case and words with three or more chars.
>>>
>>> However if you vocabulary does contain Entities that would appear in
>>> texts as specific POS (e.g. Nouns) I would really recommend you to
>>> give POS tagging a try.
>>>
>>> If you like you can try to process some of your texts using the
>>>
>>> * DBpedia proper noun linking on
>>> http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun
>>> * Freebase proper noun linking currently running in an early test
>>> version on
>>> http://dev.iks-project.eu:8083/enhancer/chain/freebase-proper-noun
>>>
>>> both chains do use the talismane integration [1] for NLP processing
>>>
>>> best
>>> Rupert
>>>
>>> > best
>>> > Rupert
>>> >
>>> >
>>> > [1] https://github.com/westei/stanbol-talismane
>>> > [2] http://dev.iks-project.eu:8081/enhancer/chain/NIF-demo
>>> > [3]
>>> >
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#linking-process
>>> >
>>> > --
>>> > | Rupert Westenthaler [email protected]
>>> > | Bodenlehenstraße 11 ++43-699-11108907
>>> > | A-5500 Bischofshofen
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler [email protected]
>>> | Bodenlehenstraße 11 ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>
>
>
> --
> | Rupert Westenthaler [email protected]
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen