Author: buildbot
Date: Thu Oct 17 10:50:17 2013
New Revision: 882978
Log:
Staging update by buildbot for stanbol
Added:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/comention.html
Modified:
websites/staging/stanbol/trunk/content/ (props changed)
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct 17 10:50:17 2013
@@ -1 +1 @@
-1532966
+1533039
Added:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/comention.html
==============================================================================
---
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/comention.html
(added)
+++
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/comention.html
Thu Oct 17 10:50:17 2013
@@ -0,0 +1,143 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - Co-Mention Engine</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link title="doap" rel="meta" type="application/rdf+xml" href="/doap.rdf"/>
+ <link rel="icon" type="image/png"
href="/images/stanbol-logo/stanbol-favicon.png"/>
+ <script type="text/javascript">
+ // Google Analytics Tracking Code
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-32086816-1']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type = 'text/javascript';
ga.async = true;
+ ga.src = ('https:' == document.location.protocol ? 'https://ssl' :
'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+</head>
+
+<body>
+ <div id="navigation"> <!-- but auto scroll the menue -->
+ <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101"
border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+ <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development
Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+<p><br /><a href="/doap.rdf"><img style="margin-left: 1em;" border="0"
alt="DOAP File" src="/images/doap.png"/></a></p>
+ </div>
+ <div id="content">
+ <div class="breadcrumbs">
+ <ul> <li><a href="/">Home</a></li> <li class="item"><a
href="/docs/">Docs</a></li> <li class="item"><a
href="/docs/trunk/">Trunk</a></li> <li class="item"><a
href="/docs/trunk/components/">Components</a></li> <li class="item"><a
href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a
href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+ </div>
+ <h1 class="title">Co-Mention Engine</h1>
+ <p>The Co-Mention engine aims to link initial mentions of Entities with
later references in the Text.</p>
+<p>The typical example are persons only mentioned by their family name after
an initial mention with the full name e.g.</p>
+<div class="codehilite"><pre><span class="p">...</span> <span
class="n">Barack</span> <span class="n">Obama</span> <span
class="n">gave</span> <span class="n">a</span> <span class="n">talk</span>
<span class="n">to</span> <span class="n">members</span> <span
class="n">of</span> <span class="n">the</span> <span class="n">Labor</span>
<span class="n">Union</span> <span class="p">...</span> <span
class="n">Obama</span> <span class="n">specially</span> <span
class="n">mentioned</span> <span class="p">...</span>
+</pre></div>
+
+
+<p><strong>NOTE:</strong> This Engine does <em>NOT</em> provide/use NLP
co-reference support (e.g. linking a Pronoun with the Entity it stands for).
Its purpose it to (1) link follow up mentions of Entities with the original one
and (2) add suggestion of the initial mention to follow up mentions.</p>
+<h2 id="configuration">Configuration</h2>
+<p>As this engine does use entity linking functionality of the <a
href="entitylinking">EntityLinkingEngine</a> its configuration uses properties
defined by the <a href="entitylinking#entity-linker-configuration">Entity
Linker Configuration</a>.</p>
+<ul>
+<li><strong>Name</strong> <em>(stanbol.enhancer.engine.name)</em>: The name of
the Enhancement Engine. This name is used to refer an <a
href="index.html">EnhancementEngine</a> in <a
href="../chains">EnhancementChain</a>s</li>
+<li><strong>ServiceRankging</strong> <em>(service.ranking)</em>: In case
multiple enhancement engines do use the same name, than only the one with the
higher ranking will get uses.</li>
+<li><strong>Case Sensitivity</strong>
<em>(enhancer.engines.linking.caseSensitive)</em>: Boolean switch that allows
to activate/deactivate case sensitive matching. It is important to understand
that even with case sensitivity activated an Entity with the label such as
"Anaconda" will be suggested for the mention of "anaconda" in the text. The
main difference will be the confidence value of such a suggestion as with case
sensitivity activated the starting letters "A" and "a" are NOT considered to be
matching. See the second technical part for details about the matching process.
Case Sensitivity is deactivated by default. It is recommended to be activated
if controlled vocabularies contain abbreviations similar to commonly used words
e.g. CAN for Canada.</li>
+<li><strong>Proper Noun Linking</strong>
<em>(enhancer.engines.linking.properNounsState)</em>: Enables/Disables proper
noun linking for searching co-mentions. By default this is disabled to also
consider Commons Nouns when searching for co-mentions. However for
Vocabularies that only contain Proper Nouns (Persons, Organizations, ...)
enabling this might be useful. For the full documentation of this feature see
the <a href="entitylinking#text-processing-configuration">Text Processing
Configuration</a> section of the EntityLinking engine.</li>
+<li><strong>Processed Languages</strong>
<em>(enhancer.engines.linking.processedLanguages)</em>: Allows the detailed
configuration on how NLP processing results should be consumed by the
Co-Mention engine. For the full documentation of this feature see the <a
href="entitylinking#text-processing-configuration">Text Processing
Configuration</a></li>
+</ul>
+<p>Other supported properties that are not included in the Felix Webconsole
configuration dialog. Those properties can only be set via OSGI configuration
files. See the <a href="entitylinking">Entity Linking Engine</a> configuration
for the full documentation of those properties</p>
+<ul>
+<li><strong>Min Search Token Length</strong>
<em>(enhancer.engines.linking.minSearchTokenLength)</em></li>
+<li><strong>Minimum Token Match Score</strong>
<em>(enhancer.engines.linking.minTokenScore)</em></li>
+<li><strong>Lemma based Matching</strong>
<em>(enhancer.engines.linking.lemmaMatching)</em></li>
+<li><strong>Max Search Token Distance</strong>
<em>(enhancer.engines.linking.maxSearchTokenDistance)</em></li>
+<li><strong>Max Search Tokens</strong>
<em>(enhancer.engines.linking.maxSearchTokens)</em></li>
+</ul>
+<p>The following properties of the EntityLinking engine are ignored:</p>
+<ul>
+<li><strong>Type Mappings</strong>
<em>(enhancer.engines.linking.typeMappings)</em>: The Co-Mention engine uses
the dc:types of the initial mention. Therefore dc:Type mappings need not to be
specified</li>
+<li><strong>Default Matching Language</strong>
<em>(enhancer.engines.linking.defaultMatchingLanguage)</em>: The engine uses
the language as detected for the parsed document for matching.</li>
+<li><strong>Redirect Field</strong>
<em>(enhancer.engines.linking.redirectField)</em> and <strong>Redirect
Mode</strong> <em>(enhancer.engines.linking.redirectMode)</em>: The engine uses
suggestions of the initial mention. Redirects where already processed for those
suggestions. Therefore the Co-Mention engine does not need to deal with
redirects.</li>
+<li><strong>Label Field</strong>
<em>(enhancer.engines.linking.labelField)</em>: The engine uses the initial
mention as label to search for co-mentions. Because of theta no label field
needs to be configured.</li>
+<li><strong>Type Field</strong> <em>(enhancer.engines.linking.typeField)</em>:
The engine uses the types of the suggestions for the initial mentions.</li>
+<li><strong>Suggestions</strong>
<em>(enhancer.engines.linking.suggestions)</em>: The Co-Mentions Engine adds
all suggestions of the initial mention to co-mentions.</li>
+<li><strong>Min Matched Tokens</strong>
<em>(enhancer.engines.linking.minFoundTokens)</em> is set to '1' meaning that
at least a single token of the initial mention needs to match co-mentions.</li>
+<li><strong>Min Label Score</strong>
<em>(enhancer.engines.linking.minLabelScore)</em> is set to '1/4' meaning that
at least 1/4 of the tokens for the initial mention need to be present in
co-mentions.
+** <strong>Min Match Score</strong>
<em>(enhancer.engines.linking.minMatchScore)</em> is set to a value so that it
does not filter any results.</li>
+</ul>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>
+
Modified:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
---
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
(original)
+++
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
Thu Oct 17 10:50:17 2013
@@ -273,7 +273,7 @@ Configuration wise this will pre-set the
<p>If used in combination with an disambiguation Engine one might want to
consider to suggest Entities where only a single token of multi-token labels do
match. In such cases a configuration like <em>Min Matched Tokens</em>=1 and
<em>Min Label Score</em> <= 0.5 (e.g. 0.4) might be considered. With such
scenarios users will also want to considerable increase the value for <em>Max
Suggestions</em> (typically values > 10).</p>
</li>
<li>
-<p><strong>Min Text Score</strong>
<em>(enhancer.engines.linking.minTextScore)</em> [0..1]::double: The "Text
Score" [0..1] represents how well the Label of an Entity matches to the
selected Span in the Text. It compares the number of matched {@link Token} from
the label with the number of Tokens enclosed by the Span in the Text an Entity
is suggested for. Not exact matches for Tokens, or if the Tokens within the
label do appear in an other order than in the text do also reduce this score.
Entities are only considered if at least one of their labels cores higher than
the minimum for all tree of <em>Min Labe Score</em>, <em>Min Text Match
Score</em> and <em>Min Match Score</em>.</p>
+<p><strong>Min Text Score</strong>
<em>(enhancer.engines.linking.minTextScore)</em> [0..1]::double: The "Text
Score" [0..1] represents how well the Label of an Entity matches to the
selected Span in the Text. It compares the number of matched {@link Token} from
the label with the number of Tokens enclosed by the Span in the Text an Entity
is suggested for. Not exact matches for Tokens, or if the Tokens within the
label do appear in an other order than in the text do also reduce this score.
Entities are only considered if at least one of their labels cores higher than
the minimum for all three of <em>Min Label Score</em>, <em>Min Text Match
Score</em> and <em>Min Match Score</em>.</p>
</li>
<li><strong>Min Match Score</strong>
<em>(enhancer.engines.linking.minMatchScore)</em> [0..1]::double: Defined as
the product of the "Text Score" with the "Label Score" - meaning that this
value represents both how well the label matches the text and how much of the
label is matched with the text. Entities are only considered if at least one of
their labels cores higher than the minimum for all tree of <em>Min Labe
Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>. </li>
<li><strong>Use EntityRankings</strong>
<em>(enhancer.engines.linking.useEntityRankings)</em> ::boolean (default=true):
Entity Rankings can be used to define the ranking (popularity, importance,
connectivity, ...) of an entity relative to other within the knowledge base.
While fise:confidence values calculated by the EntityLinkingEngie do only
represent how well a label of the entity do match with the given section in the
processed text it does make sense for manny use cases to sort Entities with the
same score based on their entity rankings (e.g. users would expect to get
"Paris (France)" suggested before "Paris (Texas)" for Paris appearing in a
text. Enabling this feature will slightly (< 0.1) change the score of
suggestions to ensure such a ordering. </li>
Modified:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
==============================================================================
---
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
(original)
+++
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
Thu Oct 17 10:50:17 2013
@@ -289,6 +289,13 @@
</ul>
</li>
<li>
+<p><strong><a href="comention">Entity Co-Mention Engine</a>:</strong></p>
+<ul>
+<li>Uses initial mentions of an Entity (e.g. 'Barack Obama' in 'Barack Obama
attended the UN security council ...')</li>
+<li>To detect co-mentions at a later position in the same document (e.g.
'Obama' in '... Obama indicated consent â¦') </li>
+</ul>
+</li>
+<li>
<p><strong>DBpedia Spotlight Annotation Engine:</strong> Integration of the
DBpedia Spotlight with the Stanbol Enhancer (see <a
href="https://issues.apache.org/jira/browse/STANBOL-706">STANBOL-706</a>)</p>
<ul>
<li>includes NLP, Entity Linking and Disambiguation of Entities using <a
href="http://dbpedia.org">DBpedia</a> as knowledge base</li>