entitylinking.html

buildbot Sun, 09 Jun 2013 22:43:40 -0700

Author: buildbot
Date: Mon Jun 10 05:43:00 2013
New Revision: 865093

Log:
Staging update by buildbot for stanbol


Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun 10 05:43:00 2013
@@ -1 +1 @@
-1491339
+1491341

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 Mon Jun 10 05:43:00 2013
@@ -135,13 +135,12 @@
 
 
 <p>where:</p>
-<div class="codehilite"><pre><span class="o">*</span> <span 
class="p">{</span><span class="n">lt</span><span class="p">}</span> <span 
class="p">...</span> <span class="n">the</span> <span 
class="n">_Linkable</span> <span class="n">Token_</span> <span 
class="k">for</span> <span class="n">that</span> <span class="n">the</span> 
<span class="n">search</span> <span class="n">is</span> <span 
class="n">issued</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">at</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">additional</span> <span class="n">_Linkable</span><span 
class="o">-</span><span class="n">_</span> <span class="n">or</span> <span 
class="n">_Matchable</span> <span class="n">Tokens_</span> <span 
class="n">included</span> <span class="n">in</span> <span class="n">the</span> 
<span class="n">search</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">lang</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">the</span> <span class="n">language</span> <span class="n">of</span> 
<span class="n">the</span> <span class="n">text</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">dl</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">the</span> <span class="n">configured</span> <span 
class="n">_Default</span> <span class="n">Matching</span> <span 
class="n">Language_</span><span class="p">.</span> <span class="n">If</span> 
<span class="p">{</span><span class="n">df</span><span class="p">}</span> <span 
class="o">==</span> <span class="p">{</span><span class="n">lang</span><span 
class="p">}</span> <span class="n">than</span> <span class="n">the</span> <span 
class="n">or</span> <span class="n">term</span><span class="p">(</span><span 
class="n">s</span><span class="p">)</span> <span class="k">for</span> <span 
class="n">the</span> <span class="p">{</span><span class="n">dl</span><span 
class="p">}</span> <span class="n">are</span> <span class="n">omitted</span>
-</pre></div>
-
-
+<ul>
+<li>{lt} ... the <em>Linkable Token</em> for that the search is issued</li>
+<li>{at} ... additional <em>Linkable-</em> or <em>Matchable Tokens</em> 
included in the search</li>
+<li>{lang} ... the language of the text</li>
+<li>{dl} ... the configured <em>Default Matching Language</em>. If '{df} == 
{lang}' than the or term(s) for the {dl} are omitted</li>
+</ul>
 <p>For results of those queries the labels in the {lang} and {dl} are matched 
against the text. However {dl} labels are only considered if no match was found 
for labels in the language of the text. For matching labels with the Tokens of 
the text the engine need to tokenize the labels. This is done by using the 
<em>LabelTokenizer</em> interface.</p>
 <p>The matching process distinguishes between matchable and non-matchable 
Tokens as well as non-alpha-numeric Tokens that are completely ignored. 
Matching starts at the position of the <em>Linkable Token</em> for that the 
search in the configured vocabulary was issued. From this position Tokens in 
the Label are matched with Tokens in the text until the first matchable or 2nd 
non-matchable token is not found. In a second round the same is done in the 
backward direction. The configured <em>Min Token Match Factor</em> determines 
how exact tokens in the text must correspond to tokens in the label so that a 
match is considered. This is repeated for all labels of an Entity. The label 
match that covers the most tokens is than considered as the match for that 
Entity.</p>
 <p>There are various parameters that can be used to fine tune the matching 
process. But the most important decision is if one want to include suggestions 
where labels with two tokens do only match a single <em>Matchable Token</em> in 
the Text (e.g. "Barack Obama" matching "Obama" but also 1000+ "Tom {something}" 
matching "Tom"). The default configuration of the Engine excludes those but 
depending on the use case and the linked vocabulary users might want to change 
this. See the documentation of the <em>Min Matched Tokens</em> and <em>Min Labe 
Score</em> for details and examples. </p>
@@ -157,7 +156,7 @@
 <p>The configuration of the EntityLinkingEngine done by parsing a 
<em>TextProcessingConfig</em> and an <em>EntityLinkingConfig</em> in it 
constructor. Both configuration classes provide an API base configuration (via 
getter and setter) as well as an OSGI Dictionary based configuration (via a 
static method that configures a new instance by an parsed configuration).</p>
 <p>The following two sections describe the "key, value" based configuration as 
the API based version is anyway described by the JavaDoc.</p>
 <h3 id="text-processing-configuration">Text Processing Configuration</h3>
-<h4 
id="proper-noun-linking-wzxhzdk16enhancerengineslinkingpropernounsstatewzxhzdk17">Proper
 Noun Linking 
<small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
+<h4 
id="proper-noun-linking-wzxhzdk15enhancerengineslinkingpropernounsstatewzxhzdk16">Proper
 Noun Linking 
<small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
 <p>This is a high level configuration option allowing users to easily specify 
if they want to do EntityLinking based on any Nouns ("Noun Linking") or only 
ProperNouns ("Proper Noun Linking").
 Configuration wise this will pre-set the defaults for the linkable 
<em>LexcicalCategories</em> and <em>Pos</em> types.</p>
 <p>"Noun linking" is equivalent to the behavior of the <a 
href="keywordlinkingengine">KeywordLinkingEngine</a> while "Proper Noun 
Linking" is similar to using NER (Named Entity Recognition) with the <a 
href="namedentityextractionengine">NamedEntityLinking</a> engine. </p>
@@ -171,7 +170,7 @@ Configuration wise this will pre-set the
 </li>
 </ol>
 <p>If suitable it is strongly recommended to activate "Proper Noun Linking" as 
it highly increases the performance because in typical text only around 1/10 of 
the Nouns are marked as Proper Nouns and therefore the amount of vocabulary 
lookups also decreases by this amount.</p>
-<h4 
id="language-processing-configuration-wzxhzdk18enhancerengineslinkingprocessedlanguageswzxhzdk19">Language
 Processing configuration 
<small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
+<h4 
id="language-processing-configuration-wzxhzdk17enhancerengineslinkingprocessedlanguageswzxhzdk18">Language
 Processing configuration 
<small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
 <p>This parameter is used for two things: (1) to specify what languages are 
processed and (2) to provide specific configurations on how languages are 
processed. For the 2nd aspect there is also a default configuration that can be 
extended with language specific setting.</p>
 <p><strong>1. Processed Languages Configuration:</strong></p>
 <p>For the configuration of the processed languages the following syntax is 
used:</p>

svn commit: r865093 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/entitylinking.html

Reply via email to