nlp: index.mdtext opennlp.mdtext paoding.mdtext smartcn.mdtext

rwesten Wed, 30 Jan 2013 05:22:01 -0800

Author: rwesten
Date: Wed Jan 30 13:21:36 2013
New Revision: 1440398

URL: http://svn.apache.org/viewvc?rev=1440398&view=rev
Log:
fixed a lot of broken links; minor improvements


Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext 
(original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/index.mdtext 
Wed Jan 30 13:21:36 2013
@@ -8,10 +8,12 @@ Overview:
 This section covers the following topics:
 
 * [Stanbol NLP processing](#stanbol-nlp-processing): Short introduction to NLP 
processing in Stanbol
-* The [NLP processing API]: Information about the Java API of the NLP 
processing Framework including information on
+* The [NLP processing API](#nlp-processing-api): Information about the Java 
API of the NLP processing Framework including information on
     * How to implement an [NLP EnhancementEngine](nlpengine) and
     * How to integrate NLP frameworks as a [RESTful NLP Analyses 
Service](restintegration)
-* Finally this document provides information about already [Integrated NLP 
processing Frameworks](#integrated-nlp-frameworks) and [Supported 
Languages](#supported-languages)
+* Finally a list supported NLP frameworks and languages
+    * [Integrated NLP processing Frameworks](#integrated-nlp-frameworks) and 
+    * [Supported Languages](#supported-languages)
 
 Additional Information can be found in
 
@@ -81,11 +83,11 @@ This section provides an overview about 
 
 ### Integrated NLP frameworks
 
-* __[OpenNLP](openly)__: Apache OpenNLP is the default NLP processing 
framework used by Stanbol. OpenNLP supports _Sentence Detection_, 
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity 
Recognition_ for several languages. Users can extend support to additional 
languages by providing their own statistical models.
+* __[OpenNLP](opennlp)__: Apache OpenNLP is the default NLP processing 
framework used by Stanbol. OpenNLP supports _Sentence Detection_, 
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity 
Recognition_ for several languages. Users can extend support to additional 
languages by providing their own statistical models.
 
 * __[Smartcn](smartcn)__: The Lucene Smartcn Analyzer integration provides 
basic language support for Chinese by providing _Sentence Detection_ and 
_Tokenization_ engines.
 
-* __[Paoding](padding)__: The Paoding Analyzer is an alternative to Smartcn 
for basic Chinese language support. Paoding only supports _Tokenization_ and is 
therefore best used in combination with the [Smartcn](smartcn) _Sentnece 
Detection_ engine.
+* __[Paoding](paoding)__: The Paoding Analyzer is an alternative to Smartcn 
for basic Chinese language support. Paoding only supports _Tokenization_ and is 
therefore best used in combination with the [Smartcn](smartcn) _Sentnece 
Detection_ engine.
 
 * __[CELI / linguagrid.org](celi)__: Celi contributed Stanbol 
EnhancementEngines based on their NLP processing Framework. It supports _Named 
Entity Recognition_ for French and Italien as well as _Lemmatization_ and 
lexical analysis for Italien, Danish, Russian, Romanian and Swedish. In 
addition CELI also provides a Language identification service
 
@@ -97,20 +99,20 @@ This section provides an overview about 
 
 * __[Freeling](https://github.com/insideout10/stanbol-freeling)__: _Freeling_ 
is an [GPL](http://www.fsf.org/licenses/gpl.html) licensed NLP processing 
framework implemented in <code>C</code>. It supports _Sentence Detection_, 
_Tokenization_, _Part of Speech_ tagging, _Chunking_ and _Named Entity 
Recognition_ for several languages including English, Spanish, Italian, Russian 
and Portuguese. 
 
-    The integration is based on the [RESTful NLP analysis 
service](restintegration) specification. That means that users will need to 
install and configure Freeling and than run the [Stanbol Freeling 
Server](https://github.com/insideout10/stanbol-freeling/tree/master/freeling-server).
 After that they can use this server by configuring the [RESTful NLP Analysis 
Engine](../engines/restfulnlpanalysis) with the `/analysis` as well as the 
[RESTful NLP Language Identification Engine](../engines/restfullangident) with 
the `/langident` endpoint of their Stanbol Freeling Server.
+    The integration is based on the [RESTful NLP analysis 
service](restfulnlpanalysisservice) specification. That means that users will 
need to install and configure Freeling and than run the [Stanbol Freeling 
Server](https://github.com/insideout10/stanbol-freeling/tree/master/freeling-server).
 After that they can use this server by configuring the [RESTful NLP Analysis 
Engine](../engines/restfulnlpanalysis) with the `/analysis` as well as the 
[RESTful NLP Language Identification Engine](../engines/restfullangident) with 
the `/langident` endpoint of their Stanbol Freeling Server.
 
     __NOTE__: As the license of Freeling is not compatible with the ASL this 
project is hosted on 
[https://github.com/insideout10/stanbol-freeling](https://github.com/insideout10/stanbol-freeling)
 and is NOT a part of Apache Stanbol. Users that want to use it will need to 
download and install it themselves.
 
 * __[Talismane](https://github.com/westei/stanbol-talismane)__: Talismane is 
an [AGPL](http://www.fsf.org/licenses/agpl.html) licensed NLP processing 
framework implemented in Java. It supports _Sentence Detection_, 
_Tokenization_, _Part of Speech_ tagging for French.
 
-    The integration is based on the [RESTful NLP analysis 
service](restintegration) specification. That means that users will need to 
download and build the 
[Stanbol-Talismane](https://github.com/westei/stanbol-talismane) project and 
than run the [Stanbol Talismane 
Server](https://github.com/westei/stanbol-talismane/tree/master/talismane-server).
 After that they can use this server by configuring the [RESTful NLP Analysis 
Engine](../engines/restfulnlpanalysis) with the `/analysis` endpoint of their 
Stanbol-Talismane server
+    The integration is based on the [RESTful NLP analysis 
service](restfulnlpanalysisservice) specification. That means that users will 
need to download and build the 
[Stanbol-Talismane](https://github.com/westei/stanbol-talismane) project and 
than run the [Stanbol Talismane 
Server](https://github.com/westei/stanbol-talismane/tree/master/talismane-server).
 After that they can use this server by configuring the [RESTful NLP Analysis 
Engine](../engines/restfulnlpanalysis) with the `/analysis` endpoint of their 
Stanbol-Talismane server
 
     __NOTE__: As the license of Talismane is not compatible with the ASL this 
project is hosted on 
[https://github.com/westei/stanbol-talismane](https://github.com/westei/stanbol-talismane)
 and is NOT a part of Apache Stanbol. Users that want to use it will need to 
download and install it themselves.
  
 
 ### Supported Languages
 
-* __Catalan__ _(Ã§a)_
+* __Catalan__ _(ca)_
     * [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence 
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and basic _NER_ without 
classification
 
 * __Chinese__ _(zh)_
@@ -118,27 +120,27 @@ This section provides an overview about 
     * [Paoding](paoding): _Tokenization_
 
 * __Danish__ _(da)_
-    * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
+    * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging
     * [CELI](celi): _Lemmatization_ and lexical analysis
 
 * __Dutch__ _(nl)_
     * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging 
and full _NER_ for Persons, Organizations and Places
 
 * __English__ _(en)_
-    * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ 
tagging, _Chunking_ and full _NER_ for Persons, Organizations and Places
+    * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging, 
_Chunking_ and full _NER_ for Persons, Organizations and Places
     * [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence 
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and full _NER_ for 
Persons, Organizations and Places
-    * [OpenCalais](../engines/opencalaisengine): __NER__
+    * [OpenCalais](../engines/opencalaisengine): _NER_
 
 * __French__ _(fr)_
     * [Talismane](https://github.com/westei/stanbol-talismane): _Sentence 
Detection_, _Tokenization_, _Part of Speech_
     * [CELI](celi): _NER_
-    * [OpenCalais](../engines/opencalaisengine): __Named Entity Recoqunition__
+    * [OpenCalais](../engines/opencalaisengine): _NER_
 
 * __Galician__ _(gl)_
     * [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence 
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and _NER_ but without 
classification
 
 * __German__ _(de)_
-    * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging 
including Proper Noun support and _Chunking_ (only Noun phrases)
+    * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging 
including Proper Noun support and _Chunking_ (only Noun phrases)
     * [CELI](celi): _Lemmatization_ and lexical analysis
 
 * __Italien__ _(it)_
@@ -160,9 +162,9 @@ This section provides an overview about 
     * [CELI](celi): _Lemmatization_ and lexical analysis
 
 * __Spanish__ _(es)_
-    * [OpenNLP] (opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging 
(no Proper Noun support) and _NER_ for Persons, Organizations and Places
+    * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_, _POS_ tagging 
(no Proper Noun support) and _NER_ for Persons, Organizations and Places
     * [Freeling](https://github.com/insideout10/stanbol-freeling): _Sentence 
Detection_, _Tokenization_, _POS_ tagging, _Chunking_ and full _NER_ for 
Persons, Organizations and Places
-    * [OpenCalais](../engines/opencalaisengine): __NER__
+    * [OpenCalais](../engines/opencalaisengine): _NER_
 
 * __Swedish__ _(sv)_
     * [OpenNLP](opennlp): _Sentence Detection_, _Tokenization_ and _POS_ 
tagging.

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext 
(original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/opennlp.mdtext 
Wed Jan 30 13:21:36 2013
@@ -44,7 +44,7 @@ Users that want to process texts by usin
     opennlp-ner
     {your-named-entity-linking}
 
-where `{your-named-entity-linking}` refers to an instance of the 
[NamedEntityLinkingEngine](../engines/namedentitytaggingengine) configured for 
the users controlled vocabulary. Users can also use multiple 
NamedEntityLinkingEngines configuration in the same chain. Users that want to 
use NER models for other types than Persons, Organizations or Places will need 
to use the [CustomNerModelEngine](../engines/customnermodelengine.mdtext) 
instead of the `opennlp-ner` engine. 
+where `{your-named-entity-linking}` refers to an instance of the 
[NamedEntityLinkingEngine](../engines/namedentitytaggingengine) configured for 
the users controlled vocabulary. Users can also use multiple 
NamedEntityLinkingEngines configuration in the same chain. Users that want to 
use NER models for other types than Persons, Organizations or Places will need 
to use the [CustomNerModelEngine](../engines/opennlpcustomner) instead of the 
`opennlp-ner` engine. 
 
 Note that the use of the `opennlp-token` and `opennlp-sentence` engine is 
optional as the `opennlp-ner` engine will to those steps itself in case tokens 
and sentences are not yet available. Including those engines explicitly in the 
chain is only required in cases where custom configurations for the tokenizers 
and sentence detection engines (e.g. custom OpenNLP models) need to be applied.
 

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext 
(original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/paoding.mdtext 
Wed Jan 30 13:21:36 2013
@@ -9,7 +9,7 @@ The integration of the Stanbol NLP proce
 * Tokenize parsed Chinese Text
 * Tokenizer Chinese Labels of Entities in the controlled vocabulary.
 
-It is highly recommended to use the Paoding Analyzer in combination with the 
[Smartcn](smatcn) as the Smartcn Analyzer provide Sentence detection.
+It is highly recommended to use the Paoding Analyzer in combination with the 
[Smartcn](smartcn) as the Smartcn Analyzer provide Sentence detection.
 
 
 Installation
@@ -60,18 +60,18 @@ To use the Paoding Analyzer for Chinese 
 
 1. the fieldType specification for Chinese
 
-    :::xml
-    <fieldType name="text_zh" class="solr.TextField">
-        <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer"/>
-    </fieldType>
+        :::xml
+        <fieldType name="text_zh" class="solr.TextField">
+            <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer"/>
+        </fieldType>
 
 2. A dynamic field using this field type that matches against Chinese language 
literals
 
-    :::xml
-    <!--
-     Dynamic field for Chinese languages.
-     -->
-    <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/>
+        :::xml
+        <!--
+         Dynamic field for Chinese languages.
+         -->
+        <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/>
 
 
 
@@ -104,4 +104,4 @@ If you want to create an empty SolrYard 
 
 If you want to use the paoding.solrindex.zip as default you can rename the 
file in the datafilee folder to "default.solrindex.zip" and the enable the "Use 
default SolrCore configuration" 
(org.apache.stanbol.entityhub.yard.solr.useDefaultConfig) when you configure a 
SolrYard instance.
 
-See also the documentation on how to [configure a managed 
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites)).
\ No newline at end of file
+See also the documentation on how to [configure a managed 
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites).
\ No newline at end of file

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext?rev=1440398&r1=1440397&r2=1440398&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext 
(original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp/smartcn.mdtext 
Wed Jan 30 13:21:36 2013
@@ -47,30 +47,30 @@ For that you will need to add two things
 
 1. A fieldType specification for Chinese
 
-    :::xml
-    <fieldType name="text_zh" class="solr.TextField" 
positionIncrementGap="100">
-      <analyzer type="index">
-        <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
-        <filter class="solr.SmartChineseWordTokenFilterFactory"/>
-        <filter class="solr.LowerCaseFilterFactory"/>
-        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
-      </analyzer>
-      <analyzer type="query">
-        <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
-        <filter class="solr.SmartChineseWordTokenFilterFactory"/>
-        <filter class="solr.LowerCaseFilterFactory"/>
-        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
-        <filter class="solr.PositionFilterFactory" />
-      </analyzer>
-    </fieldType> 
-
+        :::xml
+        <fieldType name="text_zh" class="solr.TextField" 
positionIncrementGap="100">
+            <analyzer type="index">
+                <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
+                <filter class="solr.SmartChineseWordTokenFilterFactory"/>
+                <filter class="solr.LowerCaseFilterFactory"/>
+                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+            </analyzer>
+            <analyzer type="query">
+                <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
+                <filter class="solr.SmartChineseWordTokenFilterFactory"/>
+                <filter class="solr.LowerCaseFilterFactory"/>
+                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+                <filter class="solr.PositionFilterFactory" />
+            </analyzer>
+        </fieldType> 
+    
 2. A dynamic field using this field type that matches against Chinese language 
literals
 
-    :::xml
-    <!--
-     Dynamic field for Chinese languages.
-     -->
-    <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/>
+        :::xml
+        <!--
+         Dynamic field for Chinese languages.
+         -->
+        <dynamicField name="@zh*" type="text_zh" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/>
 
 The 
[smartcn.solrindex.zip](https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/smartcn.solrindex.zip)
 is identical with the default configuration but uses the above fieldType and 
dynamicField specification.
 
@@ -94,4 +94,4 @@ If you want to create an empty SolrYard 
 
 If you want to use the smartcn.solrindex.zip as default you can rename the 
file in the datafilee folder to "default.solrindex.zip" and the enable the "Use 
default SolrCore configuration" 
(org.apache.stanbol.entityhub.yard.solr.useDefaultConfig) when you configure a 
SolrYard instance.
 
-See also the documentation on how to [configure a managed 
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites)).
+See also the documentation on how to [configure a managed 
site](http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite#configuration-of-managedsites).

svn commit: r1440398 - in /stanbol/site/trunk/content/docs/trunk/components/enhancer/nlp: index.mdtext opennlp.mdtext paoding.mdtext smartcn.mdtext

Reply via email to