Author: rwesten
Date: Thu Apr 16 08:28:05 2015
New Revision: 1674017
URL: http://svn.apache.org/r1674017
Log:
Documentation for STANBOL-1418
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-linking-mode-specific-components.png
(with props)
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.mdtext
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-linking-mode-specific-components.png
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-linking-mode-specific-components.png?rev=1674017&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-linking-mode-specific-components.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.mdtext?rev=1674017&r1=1674016&r2=1674017&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.mdtext
Thu Apr 16 08:28:05 2015
@@ -97,18 +97,41 @@ This would set the index field to "fise:
#### Linking Mode
-The FST linking engine does support two different linking modes. Those are
configures using the __Linking Mode__
_(enhancer.engines.linking.lucenefst.mode)_ property.
+The FST linking engine does support three different linking modes. Those are
configures using the __Linking Mode__
_(enhancer.engines.linking.lucenefst.mode)_ property. The linking mode property
is no longer part of the configuration form. as their are now three separate
components with a specialized configuration for each linking mode.
-
-The two modes are
+The three modes are
-1. `PLAIN`: This mode links the plain text with the vocabulary. Every single
word of the text will get looked up with the vocabulary. This mode does not use
NLP results other than language detection. This mode also ot make use of the
[Text Processing Configuration](#text-processing-configuration). The PLAIN mode
works fine with smaller and specific vocabularies that do not only contain
entities but also things like product ids, activities, adjectives ...
+1. `PLAIN`: This mode links the plain text with the vocabulary. Every single
word of the text will get looked up with the vocabulary. This mode does not use
NLP results other than language detection. Because of that this mode will
ignore any [Text Processing Configuration](#text-processing-configuration). The
PLAIN mode works fine with smaller and specific vocabularies that do not only
contain entities but also things like product ids, activities, adjectives ...
2. `LINKABLE_TOKEN`: This mode links only linkable tokens of the parsed text.
The provided [Text Processing Configuration](#text-processing-configuration) is
used to determine linkable tokens in the text (based on NLP results). This is
the default mode for this engine. It is well suited for vocabularies containing
named entities (such as persons, cities, products, organizations, roles, ...)
-<!-- 3. `NER`: This mode will only consider detected Named Entities for
linking. This mode is similar to using the [Named Entity Linking
Engine](namedentitytaggingengine). This is a best mode if the enhancement chain
contains an NER component that can detect the types of entities contained in
the linked vocabulary. -->
+3. `NER`: This mode will only consider detected Named Entities for linking.
This mode is similar to using the [Named Entity Linking
Engine](namedentitytaggingengine). This is a best mode if the enhancement chain
contains an NER component that can detect the types of entities contained in
the linked vocabulary. Important for this mode is that Named Entity types can
be mapped to types of Entities in the linked vocabulary. This allows to
validate matching entities based on their type. Those mappings are configured
by the __Named Entity Type Mappings__
_(enhancer.engines.linking.lucenefst.neTypeMapping)_ property.
+
+The _Named Entity Type Mappings_ uses the following syntax:
+
+ {named-entity-type} > {voc-type-1}[; {voc-type-2}; ...]
+
+meaning that the Named Entities with the `{named-entity-type}` will only
accept entities in the vocabulary with one of the `{voc-type-1}, {voc-type-2},
...` types. Entities of other types that would match the mention of the Named
Entities will get filtered.
+
+An typical configuration could look like the following.
+
+ dbp-ont:Person > dbp-ont:Person; schema:Person; foaf:Person
+ dbp-ont:Organisation > dbp-ont:Organisation; dbp-ont:Newspaper;
schema:Organization
+ dbp-ont:Place > dbp-ont:Place; schema:Place; geonames:Feature
+
+_NOTE:_ Also full URIs can be used
By default the FST linking engine uses the `LINKABLE_TOKEN`. In this mode this
engine behaves similar as the [Entityhub Linking Engine](entityhublinking).
+As mentioned before three OSGI components are provided for configuring FST
linking engines with the different modes:
+
+
+
+The __Apache Stanbol Enhancer Engine: FST Linking: Linkable Token__
_(org.apache.stanbol.enhancer.engines.lucenefstlinking.FstLinkingEngineComponent)_
is the default FstLinkingEngine component. It supports all configuration
parameter. When not using the user interface it is strongly recommended to use
this component for the configuration of the FST linking engine.
+
+The __Apache Stanbol Enhancer Engine: FST Linking: Plain__
_(org.apache.stanbol.enhancer.engines.lucenefstlinking.PlainFstLinkingComponnet)_
can be used to configure a `PLAIN` mode linking engine. The form excludes any
[Text Processing Configuration](#text-processing-configuration) property as
those are anyway not used in the `PLAIN` mode.
+
+The __Apache Stanbol Enhancer Engine: FST Linking: Named Entities__
_(org.apache.stanbol.enhancer.engines.lucenefstlinking.NamedEntityFstLinkingComponnet)_
is intended to allow the configuration of a FST linking engine in the `NER`
mode. It includes the __Named Entity Type Mappings__
_(enhancer.engines.linking.lucenefst.neTypeMapping)_ property in the form. This
is used to configure type mappings from the Named Entity types to types in the
linked vocabulary.
+
#### Additional Entity Information
