On Fri, Apr 19, 2013 at 5:49 PM, Joseph M'Bimbi-Bene <[email protected]> wrote: > Hello Rupert, > > since i am on it, why is "le" even considered for the matching, I thought > labels were tokenized and tokens with length < 3 were not even be > considered for the matching with default config or am i mixing different > concepts ?
Only Tokens in the Text are processed like described. For Labels no processing is done. > Do i have to code my own labelTokenizer ? Since we intend to sell a product > to a client who has no idea how that thing works and will basically enter > labels in an excel file or something of that sort, i would like to have > that behaviour. > Never tried it, but this should be possible. best Rupert > > 2013/4/19 Joseph M'Bimbi-Bene <[email protected]> > >> i forgot a screenshot in the document. >> >> >> 2013/4/19 Joseph M'Bimbi-Bene <[email protected]> >> >>> I saw thoses lines documentation and actually tried to insert the lines >>> directy in the .config file of the engine in >>> {stanbol-install-dir}/stabol/fileinstall. >>> Then i saw your answer and tried it, but it doesn't work. >>> I prepared a pdf doc with screenshots describing what i did and the >>> results, i think it will be better than >>> >>> >>> 2013/4/19 Rupert Westenthaler <[email protected]> >>> >>>> Hi Joseph: >>>> >>>> The reason for your results is the "Min Label Score" >>>> (enhancer.engines.linking.minLabelScore) parameter of the >>>> EntityLinkingEngine. >>>> >>>> Copied from [1] >>>> >>>> * Min Label Score (enhancer.engines.linking.minLabelScore) >>>> [0..1]::double: The "Label Score" [0..1] represents how much of the >>>> Label of an Entity matches with the Text. It compares the number of >>>> Tokens of the Label with the number of Tokens matched to the Text. Not >>>> exact matches for Tokens, or if the Tokens within the label do appear >>>> in an other order than in the text do also reduce this score. Entities >>>> are only considered if at least one of their labels cores higher than >>>> the minimum for all tree of Min Labe Score, Min Text Match Score and >>>> Min Match Score. >>>> >>>> The default value of this parameter is "0.75". >>>> >>>> In your case where "cette plombier moustachu" is matched against "le >>>> plombier moustachu" the actual label match score is only "0.667" (2/3 >>>> tokens of the label do match the text). Because of that the Entity is >>>> not linked in that case. >>>> >>>> If you would like to link Entities where two out of tree tokens match >>>> with the text you should lower the configuration of minLabelScore to >>>> values < "0.66" e.g. >>>> >>>> enhancer.engines.linking.minLabelScore="0.55" >>>> >>>> NOTE: As this property is not included in the configuration dialog of >>>> config tab of the Felix Webconsole you will need to set it directly >>>> via the config file of the engine instance. See [2] how to mange your >>>> configuration within the 'stanbol/fileinstall' folder. >>>> >>>> To create a configuration file for the EntityhubLinkingEngine you can >>>> follow the following steps >>>> >>>> 1. To get a config file to start with just go look at >>>> >>>> 'stanbol/config/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine' >>>> and take the '{uid}.config' files of the engine you are currently >>>> using. >>>> >>>> 2. Next you will need to name the file like >>>> >>>> "org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine-{configname}" >>>> where {configname} should be a human readable name for your >>>> configuration. >>>> >>>> 3. Now you can edit the file using a TextEditor: >>>> >>>> * remove the "service.bundleLocation", "service.factoryPid" and >>>> "service.pid" keys. Those are set by the OSGI environment and should >>>> not be in the config >>>> * add the configuration of the minLabelScore property >>>> 'enhancer.engines.linking.minLabelScore="0.55"' >>>> * you can change/add other configuration parameters as described in >>>> [1] >>>> >>>> 4. Finally you need to (1) delete the current configuration of your >>>> engine via the "config" tab of the Felix Webconsole and (2) copy your >>>> configuration file to the 'stanbol/fileinstall' folder. >>>> >>>> best >>>> Rupert >>>> >>>> [1] >>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration >>>> [2] >>>> http://stanbol.staging.apache.org/docs/trunk/production-mode/partial-updates.html >>>> >>>> On Thu, Apr 18, 2013 at 5:22 PM, Rupert Westenthaler >>>> <[email protected]> wrote: >>>> > On Thu, Apr 18, 2013 at 4:04 PM, Joseph M'Bimbi-Bene >>>> > <[email protected]> wrote: >>>> >> Thank you for your answer. >>>> >> >>>> >> But i misunderstood your indication. I mean, i thought i could >>>> specify a >>>> >> specific word to be linkable or matchable. >>>> >> >>>> >> I have another question : how can i see the score when there is no >>>> match ? >>>> >> >>>> > >>>> > If there is no match then there is no score. >>>> > >>>> > [..log..] >>>> >> ? >>>> > >>>> > OK I can see your point. This is indeed a strange behavior. To be >>>> > honest I have not tested much in settings without POS tags. So this >>>> > might be as well a bug. >>>> > >>>> > I will try to reproduce this to have a detailed look what is going on. >>>> > >>>> > best >>>> > Rupert >>>> > >>>> >> >>>> >> I tried nlp2rdf, and in the resulting rdf, i cannot see it (maybe i >>>> missed >>>> >> it though, there is so much information displayed, i am kinda lost) >>>> >> >>>> >> >>>> >> 2013/4/18 Rupert Westenthaler <[email protected]> >>>> >> >>>> >>> On Thu, Apr 18, 2013 at 3:16 PM, Joseph M'Bimbi-Bene >>>> >>> <[email protected]> wrote: >>>> >>> > I don't see the option, can you give me the procedure or a more >>>> precise >>>> >>> > indication please ? >>>> >>> > >>>> >>> >>>> >>> If you do not want to use POS tagging, than the options are limited: >>>> >>> >>>> >>> * uc {NONE/MATCH/LINK}::string - the Upper Case Token Mode allows to >>>> >>> configure how upper case words are treated. There are three possible >>>> >>> modes: (1) NONE: defines that they are not specially treated; (2) >>>> >>> MATCH defines that they are considered as matchable tokens >>>> >>> (independent of the POS tag or the token length; (3) LINK: defines >>>> >>> that they are in any case linked with the vocabulary. The default is >>>> >>> "LINK" - as upper case words often represent named entities - with >>>> the >>>> >>> exception of German ('de') where the mode is set to MATCH - as all >>>> >>> Nouns in German are upper case. >>>> >>> >>>> >>> e.g. >>>> >>> >>>> >>> >>>> >>> >>>> org.apache.stanbol.enhancer.engines.keywordextraction.processedLanguages=["fr;uc\=MATCH"] >>>> >>> enhancer.engines.linking.minSearchTokenLength=3 >>>> >>> >>>> >>> This would MATCH all upper case and words with three or more chars. >>>> >>> >>>> >>> However if you vocabulary does contain Entities that would appear in >>>> >>> texts as specific POS (e.g. Nouns) I would really recommend you to >>>> >>> give POS tagging a try. >>>> >>> >>>> >>> If you like you can try to process some of your texts using the >>>> >>> >>>> >>> * DBpedia proper noun linking on >>>> >>> http://dev.iks-project.eu:8081/enhancer/chain/dbpedia-proper-noun >>>> >>> * Freebase proper noun linking currently running in an early test >>>> >>> version on >>>> >>> http://dev.iks-project.eu:8083/enhancer/chain/freebase-proper-noun >>>> >>> >>>> >>> both chains do use the talismane integration [1] for NLP processing >>>> >>> >>>> >>> best >>>> >>> Rupert >>>> >>> >>>> >>> > best >>>> >>> > Rupert >>>> >>> > >>>> >>> > >>>> >>> > [1] https://github.com/westei/stanbol-talismane >>>> >>> > [2] http://dev.iks-project.eu:8081/enhancer/chain/NIF-demo >>>> >>> > [3] >>>> >>> > >>>> >>> >>>> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#linking-process >>>> >>> > >>>> >>> > -- >>>> >>> > | Rupert Westenthaler [email protected] >>>> >>> > | Bodenlehenstraße 11 >>>> ++43-699-11108907 >>>> >>> > | A-5500 Bischofshofen >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> | Rupert Westenthaler [email protected] >>>> >>> | Bodenlehenstraße 11 ++43-699-11108907 >>>> >>> | A-5500 Bischofshofen >>>> >>> >>>> > >>>> > >>>> > >>>> > -- >>>> > | Rupert Westenthaler [email protected] >>>> > | Bodenlehenstraße 11 ++43-699-11108907 >>>> > | A-5500 Bischofshofen >>>> >>>> >>>> >>>> -- >>>> | Rupert Westenthaler [email protected] >>>> | Bodenlehenstraße 11 ++43-699-11108907 >>>> | A-5500 Bischofshofen >>>> >>> >>> >> -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
