lucenefstlinking.html

buildbot Thu, 03 Oct 2013 06:03:15 -0700

Author: buildbot
Date: Thu Oct  3 13:02:37 2013
New Revision: 881015

Log:
Staging update by buildbot for stanbol


Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct  3 13:02:37 2013
@@ -1 +1 @@
-1528830
+1528838

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
==============================================================================
Binary files - no diff available.

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
 Thu Oct  3 13:02:37 2013
@@ -120,12 +120,12 @@ Configurations can be created by using t
 <p>This is the full list of supported Field encodings:</p>
 <ul>
 <li>SolrYard: This supports the encoding use by the Stanbol Entityhub SolrYard 
implementation to encode RDF data types and language literals. If you configure 
the FST Linking Engine for a Solr index build for the SolrYard you need to use 
this encoding</li>
-<li>MinusPrefix: {lang}-{field} (e.g. "en-name")</li>
-<li>UnderscorePrefix: {lang}_{field} (e.g. "en_name")</li>
-<li>AtPrefix: {lang}@{field} (e.g. "en@name")</li>
-<li>MinusSuffix: {field}-{lang} (e.g. "name-en")</li>
-<li>UnderscoreSuffix: {field}-{lang} (e.g. "name_en")</li>
-<li>AtSuffix: {field}-{lang} (e.g. "name@en")</li>
+<li>MinusPrefix: <code>{lang}-{field}</code> (e.g. "en-name")</li>
+<li>UnderscorePrefix: <code>{lang}_{field}</code> (e.g. "en_name")</li>
+<li>AtPrefix: <code>{lang}@{field}</code> (e.g. "en@name")</li>
+<li>MinusSuffix: <code>{field}-{lang}</code> (e.g. "name-en")</li>
+<li>UnderscoreSuffix: <code>{field}-{lang}</code> (e.g. "name_en")</li>
+<li>AtSuffix: <code>{field}-{lang}</code> (e.g. "name@en")</li>
 <li>None: In this case no prefix/suffix rewriting of configured 
<code>field</code> and <code>store</code> values is done. This means that the 
FST Configuration MUST define the exact field names in the Solr index for every 
configured language.</li>
 </ul>
 <h4 id="fst-tagging-configuration">FST Tagging Configuration</h4>
@@ -147,7 +147,7 @@ Configurations can be created by using t
 <ul>
 <li><strong>field</strong>: The indexed field in the configured Solr index. In 
multilingual scenarios this might be the 'base name' of the field that is 
extended by a prefix or suffix to get the actual field name in the Solr index 
(see also the field encoding configuration)</li>
 <li><strong>stored</strong> (default: <em>field</em> value) : The field in the 
Solr index with the stored label information. This parameter is optional. If 
not present <code>stored</code> is assumed to be equals to 
<code>field</code>.</li>
-<li><strong>fst</strong> (default based on <em>field</em> value): Optionally 
allows to manually specify the base file name of the FST models. Those files 
are assumed within the data directory of the configured Solr index under 
<code>fst/{fst}.{lang}.fst</code>. By default the configured <code>field</code> 
name is used (with non alpha-numeric chars replaced by '_').If runtime creation 
is enabled those files will be created if not present.</li>
+<li><strong>fst</strong> (default based on <em>field</em> value): This 
parameter allows to specify the name of the FST file stored within the FST 
directory (as configured by the [FST storage location]. The default name is 
generated by using the <code>field</code> with non alpha-numeric chars replaced 
by '_').</li>
 <li><strong>generate</strong> (default: false): If enabled the Engine will 
generate missing FST models. If this is enabled the engine will also be able to 
update FST models after changes to the Solr Index. <strong>NOTE</strong> that 
the creation of FST models is an expensive operation (both CPU and memory 
wise). The FST engine uses a pool of low priority threads to create FST models. 
The size of the pool can be configured by using the 
<code>enhancer.engines.linking.lucenefst.fstThreadPoolSize</code> parameter. 
Because of this the default is <code>false</code>.</li>
 </ul>
 <p>A more advanced Configuration might look like:</p>
@@ -187,10 +187,11 @@ Configurations can be created by using t
 <li><code>solr-server-name</code>: the name of the <a 
href="/docs/trunk/utils/commons-solr#referencedsolrserver">ReferencedSolrServer</a>
 or <a 
href="/docs/trunk/utils/commons-solr#managedsolrserver">ManagedSolrServer</a> 
holding the SolrCore (see also [Configuration of the Solr Index]</li>
 <li><code>solr-core-name</code> : the name of the SolrCore</li>
 </ul>
-<p>The default value of this property is <code>${solr-data-dir}/fst</code>. To 
manage FST models within the Stanbol folder you can us e.g. 
<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>.</p>
+<p>The default value of this property is '<code>${solr-data-dir}/fst</code>'. 
To manage FST models within the Stanbol folder you can us e.g. 
'<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>'.</p>
 <h3 id="entity-cache-configuration">Entity Cache Configuration</h3>
 <p>While FST tagging is fully done in-memory the FST linking engine needs to 
read information of matching Entities from the Solr index. This requires disc 
IO and is typically the part of the process that consumes the most time. The 
Entity Cache tries to prevent such disc level IO by caching SolrDocuments 
containing only fields required for the linking process (labels, types and (if 
available) entity rankings).  To further reduce memory requirements only labels 
in languages requested by processed ContentItems are stored in the cache. The 
Cache uses the LRU semantic and is based on the Solr cache implementation.</p>
-<p>The size of the cache can be configured by using the 
<code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The 
default size is ~65k entities. Increasing the maximum size of the cache will 
improve performance. For small and medium sized vocabularies the cache can be 
configured </p>
+<p>The size of the cache can be configured by using the 
<code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The 
default size is ~65k entities. Increasing the maximum size of the cache will 
improve performance. </p>
+<p><strong>TIP:</strong> For small and medium sized vocabularies the cache can 
be configured to be &gt;= as the size of Entities in the Vocabulary. In this 
case the FST linking engine will full operate in-memory. For such scenarios 
linking was up to 100 times faster as with the <a 
href="entityhublinking">Entityhub Linking Engine</a></p>
 <h3 id="text-processing-configuration">Text Processing Configuration</h3>
 <p>With the extension of the SolrTextTagger with a <a 
href="https://github.com/OpenSextant/SolrTextTagger/pull/7";>TaggingAttribute</a>
 the FST linking engine can support the exact same text processing 
functionality as the other Entity Linking Engine.</p>
 <p>For the configuration please see the <a 
href="entitylinking#text-processing-configuration">Text Processing 
configuration</a> section of the Entity Linking Engine.</p>
@@ -200,9 +201,9 @@ Configurations can be created by using t
 <li><s><strong>Label Field</strong> 
<em>(enhancer.engines.linking.labelField)</em></s>: The label field is 
<strong>IGNORED</strong> as the field holding the labels is anyway provided by 
the [FST Tagging Configuration]. That means that the field defined by the 
<em>stored</em> parameter is used. If the <em>stored</em> parameter is not 
present it fallbacks to the <em>field</em> parameter.</li>
 <li><s><strong>Type Field</strong> 
<em>(enhancer.engines.linking.typeField)</em></s>: This configuration gets 
<strong>IGNORED</strong> in favor of the 
<code>enhancer.engines.linking.lucenefst.typeField</code>. See the [Additional 
Entity Information] section for details. </li>
 <li><strong>Redirect Field</strong> 
<em>(enhancer.engines.linking.redirectField)</em></s>: Note implemented. 
<strong>NOTE</strong> This might not be possible to efficiently implement. When 
those redirects need already be considered when building the FST models.</li>
-<li><s><strong>Use EntityRankings 
(enhancer.engines.linking.useEntityRankings)_</s>: This configuration gets 
</strong>IGNORED__. EntityRanking based sorting is enabled as soon as the 
<em>Entity Ranking Field</em> is configured.</li>
+<li><s><strong>Use EntityRankings</strong> 
<em>(enhancer.engines.linking.useEntityRankings)</em></s>: This configuration 
gets <strong>IGNORED</strong>. EntityRanking based sorting is enabled as soon 
as the <em>Entity Ranking Field</em> is configured.</li>
 <li><s><strong>Lemma based Matching</strong> 
<em>(enhancer.engines.linking.lemmaMatching)</em></s>: Not Yet implemented</li>
-<li><s><strong>Min Match Score</strong> 
<em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. 
Currently all linked Entities are added regardless of their score. However the 
way the Tagging is done makes it very unlikely to have suggestions with 
<code>fise:confidence</code> values less as 0.5.</li>
+<li><s><strong>Min Match Score</strong> 
<em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. The 
FST linking engine is based on the Lucene Analyzer chains configured for the 
<em>index</em> and <em>store</em> field of the FST configuration. Only if 
Tokens do match after the Analyzers where applied a Entity is suggested.</li>
 </ul>
 <p>In addition the following properties are <strong>IGNORED</strong> as they 
are not relevant for the FST Linking Engine:</p>
 <ul>

svn commit: r881015 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png docs/trunk/components/enhancer/engines/lucenefstlinking.html

Reply via email to