Author: rwesten
Date: Wed Jan 30 13:21:36 2013
New Revision: 1440398
URL: http://svn.apache.org/viewvc?rev=1440398&view=rev
Log:
fixed a lot of broken links; minor improvements
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
(original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
Wed Jan 30 13:21:36 2013
@@ -8,10 +8,12 @@ Overview:
This section covers the following topics:
* [Stanbol NLP processing](#stanbol-nlp-processing): Short introduction to NLP
processing in Stanbol
-* The [NLP processing API]: Information about the Java API of the NLP
processing Framework including information on
+* The [NLP processing API](#nlp-processing-api): Information about the Java
API of the NLP processing Framework including information on
* How to implement an [NLP EnhancementEngine](nlpengine) and
* How to integrate NLP frameworks as a [RESTful NLP Analyses
Service](restintegration)
-* Finally this document provides information about already [Integrated NLP
processing Frameworks](#integrated-nlp-frameworks) and [Supported
Languages](#supported-languages)
+* Finally a list supported NLP frameworks and languages
+ * [Integrated NLP processing Frameworks](#integrated-nlp-frameworks) and
+ * [Supported Languages](#supported-languages)
Additional Information can be found in
@@ -81,11 +83,11 @@ This section provides an overview about
### Integrated NLP frameworks
-* __[OpenNLP](openly)__: Apache OpenNLP is the default NLP processing
framework used by Stanbol. OpenNLP supports _Sentence Detection_,
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity
Recognition_ for several languages. Users can extend support to additional
languages by providing their own statistical models.
+* __[OpenNLP](opennlp)__: Apache OpenNLP is the default NLP processing
framework used by Stanbol. OpenNLP supports _Sentence Detection_,
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity
Recognition_ for several languages. Users can extend support to additional
languages by providing their own statistical models.
* __[Smartcn](smartcn)__: The Lucene Smartcn Analyzer integration provides
basic language support for Chinese by providing _Sentence Detection_ and
_Tokenization_ engines.
-* __[Paoding](padding)__: The Paoding Analyzer is an alternative to Smartcn
for basic Chinese language support. Paoding only supports _Tokenization_ and is
therefore best used in combination with the [Smartcn](smartcn) _Sentnece
Detection_ engine.
+* __[Paoding](paoding)__: The Paoding Analyzer is an alternative to Smartcn
for basic Chinese language support. Paoding only supports _Tokenization_ and is
therefore best used in combination with the [Smartcn](smartcn) _Sentnece
Detection_ engine.
* __[CELI / linguagrid.org](celi)__: Celi contributed Stanbol
EnhancementEngines based on their NLP processing Framework. It supports _Named
Entity Recognition_ for French and Italien as well as _Lemmatization_ and
lexical analysis for Italien, Danish, Russian, Romanian and Swedish. In
addition CELI also provides a Language identification service
@@ -97,20 +99,20 @@ This section provides an overview about
* __[Freeling](https://github.com/insideout10/stanbol-freeling)__: _Freeling_
is an [GPL](http://www.fsf.org/licenses/gpl.html) licensed NLP processing
framework implemented in <code>C</code>. It supports _Sentence Detection_,
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity
Recognition_ for several languages including English, Spanish, Italian, Russian
and Portuguese.
- The integration is based on the [RESTful NLP analysis
service](restintegration) specification. That means that users will need to
install and configure Freeling and than run the [Stanbol Freeling
Server](https://github.com/insideout10/stanbol-freeling/tree/master/freeling-server).
After that they can use this server by configuring the [RESTful NLP Analysis
Engine](../engines/restfulnlpanalysis) with the `/analysis` as well as the
[RESTful NLP Language Identification Engine](../engines/restfullangident) with
the `/langident` endpoint of their Stanbol Freeling Server.
+ The integration is based on the [RESTful NLP analysis
service](restfulnlpanalysisservice) specification. That means that users will
need to install and configure Freeling and than run the [Stanbol Freeling
Server](https://github.com/insideout10/stanbol-freeling/tree/master/freeling-server).
After that they can use this server by configuring the [RESTful NLP Analysis
Engine](../engines/restfulnlpanalysis) with the `/analysis` as well as the
[RESTful NLP Language Identification Engine](../engines/restfullangident) with
the `/langident` endpoint of their Stanbol Freeling Server.
__NOTE__: As the license of Freeling is not compatible with the ASL this
project is hosted on
[https://github.com/insideout10/stanbol-freeling](https://github.com/insideout10/stanbol-freeling)
and is NOT a part of Apache Stanbol. Users that want to use it will need to
download and install it themselves.
* __[Talismane](https://github.com/westei/stanbol-talismane)__: Talismane is
an [AGPL](http://www.fsf.org/licenses/agpl.html) licensed NLP processing
framework implemented in Java. It supports _Sentence Detection_,
_Tokenization_, _Part of Speech_ tagging for French.
- The integration is based on the [RESTful NLP analysis
service](restintegration) specification. That means that users will need to
download and build the
[Stanbol-Talismane](https://github.com/westei/stanbol-talismane) project and
than run the [Stanbol Talismane
Server](https://github.com/westei/stanbol-talismane/tree/master/talismane-server).
After that they can use this server by configuring the [RESTful NLP Analysis
Engine](../engines/restfulnlpanalysis) with the `/analysis` endpoint of their
Stanbol-Talismane server
+ The integration is based on the [RESTful NLP analysis
service](restfulnlpanalysisservice) specification. That means that users will
need to download and build the
[Stanbol-Talismane](https://github.com/westei/stanbol-talismane) project and
than run the [Stanbol Talismane
Server](https://github.com/westei/stanbol-talismane/tree/master/talismane-server).
After that they can use this server by configuring the [RESTful NLP Analysis
Engine](../engines/restfulnlpanalysis) with the `/analysis` endpoint of their
Stanbol-Talismane server
__NOTE__: As the license of Talismane is not compatible with the ASL this
project is hosted on
[https://github.com/westei/stanbol-talismane](https://github.com/westei/stanbol-talismane)
and is NOT a part of Apache Stanbol. Users that want to use it will need to
download and install it themselves.
### Supported Languages
-* __Catalan__ _(ça)_
+* __Catalan__ _(ca)_
* [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and basic _NER_ without
classification
* __Chinese__ _(zh)_
@@ -118,27 +120,27 @@ This section provides an overview about
* [Paoding](paoding): _Tokenization_
* __Danish__ _(da)_
- * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
+ * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
* [CELI](celi): _Lemmatization_ and lexical analysis
* __Dutch__ _(nl)_
* [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
and full _NER_ for Persons, Organizations and Places
* __English__ _(en)_
- * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_
tagging, _Chunking_ and full _NER_ for Persons, Organizations and Places
+ * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging,
_Chunking_ and full _NER_ for Persons, Organizations and Places
* [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and full _NER_ for
Persons, Organizations and Places
- * [OpenCalais](../engines/opencalaisengine): __NER__
+ * [OpenCalais](../engines/opencalaisengine): _NER_
* __French__ _(fr)_
* [Talismane](https://github.com/westei/stanbol-talismane): _Sentence
Detection_, _Tokenization_, _Part of Speech_
* [CELI](celi): _NER_
- * [OpenCalais](../engines/opencalaisengine): __Named Entity Recoqunition__
+ * [OpenCalais](../engines/opencalaisengine): _NER_
* __Galician__ _(gl)_
* [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and _NER_ but without
classification
* __German__ _(de)_
- * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
including Proper Noun support and _Chunking_ (only Noun phrases)
+ * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
including Proper Noun support and _Chunking_ (only Noun phrases)
* [CELI](celi): _Lemmatization_ and lexical analysis
* __Italien__ _(it)_
@@ -160,9 +162,9 @@ This section provides an overview about
* [CELI](celi): _Lemmatization_ and lexical analysis
* __Spanish__ _(es)_
- * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
(no Proper Noun support) and _NER_ for Persons, Organizations and Places
+ * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
(no Proper Noun support) and _NER_ for Persons, Organizations and Places
* [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and full _NER_ for
Persons, Organizations and Places
- * [OpenCalais](../engines/opencalaisengine): __NER__
+ * [OpenCalais](../engines/opencalaisengine): _NER_
* __Swedish__ _(sv)_
* [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_ and _POS_
tagging.
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
Wed Jan 30 13:21:36 2013
@@ -44,7 +44,7 @@ Users that want to process texts by usin
opennlp-ner
{your-named-entity-linking}
-where `{your-named-entity-linking}` refers to an instance of the
[NamedEntityLinkingEngine](../engines/namedentitytaggingengine) configured for
the users controlled vocabulary. Users can also use multiple
NamedEntityLinkingEngines configuration in the same chain. Users that want to
use NER models for other types than Persons, Organizations or Places will need
to use the [CustomNerModelEngine](../engines/customnermodelengine.mdtext)
instead of the `opennlp-ner` engine.
+where `{your-named-entity-linking}` refers to an instance of the
[NamedEntityLinkingEngine](../engines/namedentitytaggingengine) configured for
the users controlled vocabulary. Users can also use multiple
NamedEntityLinkingEngines configuration in the same chain. Users that want to
use NER models for other types than Persons, Organizations or Places will need
to use the [CustomNerModelEngine](../engines/opennlpcustomner) instead of the
`opennlp-ner` engine.
Note that the use of the `opennlp-token` and `opennlp-sentence` engine is
optional as the `opennlp-ner` engine will to those steps itself in case tokens
and sentences are not yet available. Including those engines explicitly in the
chain is only required in cases where custom configurations for the tokenizers
and sentence detection engines (e.g. custom OpenNLP models) need to be applied.
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
Wed Jan 30 13:21:36 2013
@@ -9,7 +9,7 @@ The integration of the Stanbol NLP proce
* Tokenize parsed Chinese Text
* Tokenizer Chinese Labels of Entities in the controlled vocabulary.
-It is highly recommended to use the Paoding Analyzer in combination with the
[Smartcn](smatcn) as the Smartcn Analyzer provide Sentence detection.
+It is highly recommended to use the Paoding Analyzer in combination with the
[Smartcn](smartcn) as the Smartcn Analyzer provide Sentence detection.
Installation
@@ -60,18 +60,18 @@ To use the Paoding Analyzer for Chinese
1. the fieldType specification for Chinese
- :::xml
- <fieldType name="text_zh" class="solr.TextField">
- <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer"/>
- </fieldType>
+ :::xml
+ <fieldType name="text_zh" class="solr.TextField">
+ <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer"/>
+ </fieldType>
2. A dynamic field using this field type that matches against Chinese language
literals
- :::xml
- <!--
- Dynamic field for Chinese languages.
- -->
- <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true"
multiValued="true" omitNorms="false"/>
+ :::xml
+ <!--
+ Dynamic field for Chinese languages.
+ -->
+ <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true"
multiValued="true" omitNorms="false"/>
@@ -104,4 +104,4 @@ If you want to create an empty SolrYard
If you want to use the paoding.solrindex.zip as default you can rename the
file in the datafilee folder to "default.solrindex.zip" and the enable the "Use
default SolrCore configuration"
(org.apache.stanbol.entityhub.yard.solr.useDefaultConfig) when you configure a
SolrYard instance.
-See also the documentation on how to [configure a managed
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites)).
\ No newline at end of file
+See also the documentation on how to [configure a managed
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites).
\ No newline at end of file
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext
Wed Jan 30 13:21:36 2013
@@ -47,30 +47,30 @@ For that you will need to add two things
1. A fieldType specification for Chinese
- :::xml
- <fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">
- <analyzer type="index">
- <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
- <filter class="solr.SmartChineseWordTokenFilterFactory"/>
- <filter class="solr.LowerCaseFilterFactory"/>
- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
- </analyzer>
- <analyzer type="query">
- <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
- <filter class="solr.SmartChineseWordTokenFilterFactory"/>
- <filter class="solr.LowerCaseFilterFactory"/>
- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
- <filter class="solr.PositionFilterFactory" />
- </analyzer>
- </fieldType>
-
+ :::xml
+ <fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">
+ <analyzer type="index">
+ <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
+ <filter class="solr.SmartChineseWordTokenFilterFactory"/>
+ <filter class="solr.LowerCaseFilterFactory"/>
+ <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+ </analyzer>
+ <analyzer type="query">
+ <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
+ <filter class="solr.SmartChineseWordTokenFilterFactory"/>
+ <filter class="solr.LowerCaseFilterFactory"/>
+ <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+ <filter class="solr.PositionFilterFactory" />
+ </analyzer>
+ </fieldType>
+
2. A dynamic field using this field type that matches against Chinese language
literals
- :::xml
- <!--
- Dynamic field for Chinese languages.
- -->
- <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true"
multiValued="true" omitNorms="false"/>
+ :::xml
+ <!--
+ Dynamic field for Chinese languages.
+ -->
+ <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true"
multiValued="true" omitNorms="false"/>
The
[smartcn.solrindex.zip](https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/smartcn.solrindex.zip)
is identical with the default configuration but uses the above fieldType and
dynamicField specification.
@@ -94,4 +94,4 @@ If you want to create an empty SolrYard
If you want to use the smartcn.solrindex.zip as default you can rename the
file in the datafilee folder to "default.solrindex.zip" and the enable the "Use
default SolrCore configuration"
(org.apache.stanbol.entityhub.yard.solr.useDefaultConfig) when you configure a
SolrYard instance.
-See also the documentation on how to [configure a managed
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites)).
+See also the documentation on how to [configure a managed
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites).