Author: buildbot
Date: Thu Oct 3 13:02:37 2013
New Revision: 881015
Log:
Staging update by buildbot for stanbol
Modified:
websites/staging/stanbol/trunk/content/ (props changed)
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct 3 13:02:37 2013
@@ -1 +1 @@
-1528830
+1528838
Modified:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
==============================================================================
Binary files - no diff available.
Modified:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
==============================================================================
---
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
(original)
+++
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
Thu Oct 3 13:02:37 2013
@@ -120,12 +120,12 @@ Configurations can be created by using t
<p>This is the full list of supported Field encodings:</p>
<ul>
<li>SolrYard: This supports the encoding use by the Stanbol Entityhub SolrYard
implementation to encode RDF data types and language literals. If you configure
the FST Linking Engine for a Solr index build for the SolrYard you need to use
this encoding</li>
-<li>MinusPrefix: {lang}-{field} (e.g. "en-name")</li>
-<li>UnderscorePrefix: {lang}_{field} (e.g. "en_name")</li>
-<li>AtPrefix: {lang}@{field} (e.g. "en@name")</li>
-<li>MinusSuffix: {field}-{lang} (e.g. "name-en")</li>
-<li>UnderscoreSuffix: {field}-{lang} (e.g. "name_en")</li>
-<li>AtSuffix: {field}-{lang} (e.g. "name@en")</li>
+<li>MinusPrefix: <code>{lang}-{field}</code> (e.g. "en-name")</li>
+<li>UnderscorePrefix: <code>{lang}_{field}</code> (e.g. "en_name")</li>
+<li>AtPrefix: <code>{lang}@{field}</code> (e.g. "en@name")</li>
+<li>MinusSuffix: <code>{field}-{lang}</code> (e.g. "name-en")</li>
+<li>UnderscoreSuffix: <code>{field}-{lang}</code> (e.g. "name_en")</li>
+<li>AtSuffix: <code>{field}-{lang}</code> (e.g. "name@en")</li>
<li>None: In this case no prefix/suffix rewriting of configured
<code>field</code> and <code>store</code> values is done. This means that the
FST Configuration MUST define the exact field names in the Solr index for every
configured language.</li>
</ul>
<h4 id="fst-tagging-configuration">FST Tagging Configuration</h4>
@@ -147,7 +147,7 @@ Configurations can be created by using t
<ul>
<li><strong>field</strong>: The indexed field in the configured Solr index. In
multilingual scenarios this might be the 'base name' of the field that is
extended by a prefix or suffix to get the actual field name in the Solr index
(see also the field encoding configuration)</li>
<li><strong>stored</strong> (default: <em>field</em> value) : The field in the
Solr index with the stored label information. This parameter is optional. If
not present <code>stored</code> is assumed to be equals to
<code>field</code>.</li>
-<li><strong>fst</strong> (default based on <em>field</em> value): Optionally
allows to manually specify the base file name of the FST models. Those files
are assumed within the data directory of the configured Solr index under
<code>fst/{fst}.{lang}.fst</code>. By default the configured <code>field</code>
name is used (with non alpha-numeric chars replaced by '_').If runtime creation
is enabled those files will be created if not present.</li>
+<li><strong>fst</strong> (default based on <em>field</em> value): This
parameter allows to specify the name of the FST file stored within the FST
directory (as configured by the [FST storage location]. The default name is
generated by using the <code>field</code> with non alpha-numeric chars replaced
by '_').</li>
<li><strong>generate</strong> (default: false): If enabled the Engine will
generate missing FST models. If this is enabled the engine will also be able to
update FST models after changes to the Solr Index. <strong>NOTE</strong> that
the creation of FST models is an expensive operation (both CPU and memory
wise). The FST engine uses a pool of low priority threads to create FST models.
The size of the pool can be configured by using the
<code>enhancer.engines.linking.lucenefst.fstThreadPoolSize</code> parameter.
Because of this the default is <code>false</code>.</li>
</ul>
<p>A more advanced Configuration might look like:</p>
@@ -187,10 +187,11 @@ Configurations can be created by using t
<li><code>solr-server-name</code>: the name of the <a
href="/docs/trunk/utils/commons-solr#referencedsolrserver">ReferencedSolrServer</a>
or <a
href="/docs/trunk/utils/commons-solr#managedsolrserver">ManagedSolrServer</a>
holding the SolrCore (see also [Configuration of the Solr Index]</li>
<li><code>solr-core-name</code> : the name of the SolrCore</li>
</ul>
-<p>The default value of this property is <code>${solr-data-dir}/fst</code>. To
manage FST models within the Stanbol folder you can us e.g.
<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>.</p>
+<p>The default value of this property is '<code>${solr-data-dir}/fst</code>'.
To manage FST models within the Stanbol folder you can us e.g.
'<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>'.</p>
<h3 id="entity-cache-configuration">Entity Cache Configuration</h3>
<p>While FST tagging is fully done in-memory the FST linking engine needs to
read information of matching Entities from the Solr index. This requires disc
IO and is typically the part of the process that consumes the most time. The
Entity Cache tries to prevent such disc level IO by caching SolrDocuments
containing only fields required for the linking process (labels, types and (if
available) entity rankings). To further reduce memory requirements only labels
in languages requested by processed ContentItems are stored in the cache. The
Cache uses the LRU semantic and is based on the Solr cache implementation.</p>
-<p>The size of the cache can be configured by using the
<code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The
default size is ~65k entities. Increasing the maximum size of the cache will
improve performance. For small and medium sized vocabularies the cache can be
configured </p>
+<p>The size of the cache can be configured by using the
<code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The
default size is ~65k entities. Increasing the maximum size of the cache will
improve performance. </p>
+<p><strong>TIP:</strong> For small and medium sized vocabularies the cache can
be configured to be >= as the size of Entities in the Vocabulary. In this
case the FST linking engine will full operate in-memory. For such scenarios
linking was up to 100 times faster as with the <a
href="entityhublinking">Entityhub Linking Engine</a></p>
<h3 id="text-processing-configuration">Text Processing Configuration</h3>
<p>With the extension of the SolrTextTagger with a <a
href="https://github.com/OpenSextant/SolrTextTagger/pull/7">TaggingAttribute</a>
the FST linking engine can support the exact same text processing
functionality as the other Entity Linking Engine.</p>
<p>For the configuration please see the <a
href="entitylinking#text-processing-configuration">Text Processing
configuration</a> section of the Entity Linking Engine.</p>
@@ -200,9 +201,9 @@ Configurations can be created by using t
<li><s><strong>Label Field</strong>
<em>(enhancer.engines.linking.labelField)</em></s>: The label field is
<strong>IGNORED</strong> as the field holding the labels is anyway provided by
the [FST Tagging Configuration]. That means that the field defined by the
<em>stored</em> parameter is used. If the <em>stored</em> parameter is not
present it fallbacks to the <em>field</em> parameter.</li>
<li><s><strong>Type Field</strong>
<em>(enhancer.engines.linking.typeField)</em></s>: This configuration gets
<strong>IGNORED</strong> in favor of the
<code>enhancer.engines.linking.lucenefst.typeField</code>. See the [Additional
Entity Information] section for details. </li>
<li><strong>Redirect Field</strong>
<em>(enhancer.engines.linking.redirectField)</em></s>: Note implemented.
<strong>NOTE</strong> This might not be possible to efficiently implement. When
those redirects need already be considered when building the FST models.</li>
-<li><s><strong>Use EntityRankings
(enhancer.engines.linking.useEntityRankings)_</s>: This configuration gets
</strong>IGNORED__. EntityRanking based sorting is enabled as soon as the
<em>Entity Ranking Field</em> is configured.</li>
+<li><s><strong>Use EntityRankings</strong>
<em>(enhancer.engines.linking.useEntityRankings)</em></s>: This configuration
gets <strong>IGNORED</strong>. EntityRanking based sorting is enabled as soon
as the <em>Entity Ranking Field</em> is configured.</li>
<li><s><strong>Lemma based Matching</strong>
<em>(enhancer.engines.linking.lemmaMatching)</em></s>: Not Yet implemented</li>
-<li><s><strong>Min Match Score</strong>
<em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented.
Currently all linked Entities are added regardless of their score. However the
way the Tagging is done makes it very unlikely to have suggestions with
<code>fise:confidence</code> values less as 0.5.</li>
+<li><s><strong>Min Match Score</strong>
<em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. The
FST linking engine is based on the Lucene Analyzer chains configured for the
<em>index</em> and <em>store</em> field of the FST configuration. Only if
Tokens do match after the Analyzers where applied a Entity is suggested.</li>
</ul>
<p>In addition the following properties are <strong>IGNORED</strong> as they
are not relevant for the FST Linking Engine:</p>
<ul>