Author: rwesten
Date: Mon Oct 27 15:19:11 2014
New Revision: 1634568

URL: http://svn.apache.org/r1634568
Log:
added Enhancement Engine Documentation for STANBOL-1397

Added:
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
   (with props)
Modified:
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1634568&r1=1634567&r2=1634568&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext 
(original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext 
Mon Oct 27 15:19:11 2014
@@ -213,9 +213,9 @@ Apache Stanbol provide a core implementa
 
 ### Others
 
-* _NLP 2 RDF Engine:_ __under development__ (see 
[STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
-       * converts NLP processing results stored in the 
[AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the 
metadata of the [ContentItem](../contentitem)
-       * generated RDF uses the NIF (NLP Interchange Format)
+* __[NIF 2.0 Transformation Engine](nif20)__ allows to serialize low level NLP 
results as RDF
+    * [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) stands for NLP 
Interchange Format. It defines an RDF schema that allows to describe Sentences, 
Phrases, Words and its NLP annotation.
+    * This engines allows to retrieve detailed information about NLP results 
typically only available by the Java API of the [Analysed 
Text](../nlp/analyzedtext) content part.
 
 
 ## Deprecated
@@ -227,6 +227,13 @@ Enhancement Engines listed below are no 
        * supports multiple languages
        * detects occurrences of untyped entities as concepts, takes local 
taxonomies as linking target 
 
+* _NLP 2 RDF Engine:_ __under development__ (see 
[STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
+    * replaced by the __[NIF 2.0 Transformation Engine](nif20)__ that 
supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0
+       * converts NLP processing results stored in the 
[AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the 
metadata of the [ContentItem](../contentitem)
+       * generated RDF uses the NIF (NLP Interchange Format)
+
+
+
 * _CachingDereferencerEngine_ __deprecated__ (see dereferencing support of 
individual engines as well as  
[STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
        * retrieves additional content for presenting the enhancement results.
 

Added: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext?rev=1634568&view=auto
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext 
(added)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext 
Mon Oct 27 15:19:11 2014
@@ -0,0 +1,193 @@
+Title: NIF 2.0 Transformation Engine
+
+Typically low level NLP results are not included to the RDF enhancement 
results. This engine supports the serialization of such results by using the 
[NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) (NLP Interchange Format) 
 standard.
+
+## Processed Information (Input)
+
+Apache Stanbol manages NLP results by the [Analysed Text](../nlp/analyzedtext) 
content part. This ContentPart provides a Java API for accessing those results. 
This engine reads such information and transformes it according to the [NIF 
2.0](http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html)
 core ontology. 
+
+If a ContentItem does not contain this content part it will not be processed 
by this engine.
+
+## Created RDF
+
+The engine serializes the following information:
+
+* Segment URIs by using the [RFC 5147](http://tools.ietf.org/html/rfc5147) URI 
scheme
+* Selector information like `nif:beginIndex`, `nif:endIndex` as well as 
`nif:before`, `nif:anchorOf` and `nif:after`. For spans longer as 100 chars the 
`nif:head` property is used instead of `nif:anchorOf`.
+* Context information: This includes `nif:referenceContext` links for all 
Strings as well as additional metadata for the context.
+* String hierarchies: `nif:sub-/nif:superWord`, `nif:sentence`
+* String navigation: `nif:next-/nif:previousSentnece`, 
`nif:next-/nif:previousWord`
+* String annotations: `nif:oliaCategory`, `nif:oliaConfidence` and `nif:posTag`
+
+### Configuration
+
+The Engine supports several switches that allow to enable/disable the 
serialization of NIF information. The engine supports the configuration of 
multiple instances with different configurations. The following figure shows 
the configuration dialog:
+
+![NIF2.0 Engine Configuration](nif20config.png)
+
+* __Selector__ _(enhancer.engines.nlp2rdf.selector)_: Allows to enable/disable 
the serialization of selector related properties such as `nif:beginIndex`, 
`nif:endIndex`, `nif:before`, `nif:anchorOf` and `nif:after`. If disabled 
clients can still parse the start/end indexes from the [RFC 
5147](http://tools.ietf.org/html/rfc5147) encoded segment URI.
+* __Hierarchy__ _(enhancer.engines.nlp2rdf.hierarchy)_: Switch that allows to 
enable/disable writing of hierarchical links. This includes `olia:sentence`, 
`olia:superString` and `olia:subString` properties.
+* __Previous and Next Links__ _(enhancer.engines.nlp2rdf.previousNext)_: 
Allows to enable/disable the serialization of links to the previous/next 
sentence/word
+* __Context only URI Scheme__ 
_(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)_: If enabled the used [RFC 
5147](http://tools.ietf.org/html/rfc5147) URI scheme is added only to the 
`rdf:type` of the `nif:Context`. If disabled the `nif:RFC5147String` `rdf:type` 
is added to all segments.
+* __String Type__ _(enhancer.engines.nlp2rdf.writeStringType)_: If enabled the 
`nif:String` type is added to all serialized segments. If disabled only more 
specific types like `nif:Sentence` or `nif:Word` are used.
+
+### Examples
+
+This sections provides some examples of RDF generated by this Engine. OpenNLP 
was used to create the serialized NLP annotation. The Sentence `The Apache 
Stanbol Enhancer can detect entities in text` was used for generating this 
example.
+
+    :::text
+    @prefix content 
<urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
+    @prefix nif  
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
+    @prefix olia  <http://purl.org/olia/olia.owl#> .
+    @prefix  xsd  <http://www.w3.org/2001/XMLSchema#> .
+
+The first Turtle snippet shows the `nif:Context` instance. This is referenced 
by all segments and it will refer to the URI of the ContentItem by using the 
`nif:sourceUrl`.
+
+    :::text
+    content:char=0
+        a nif:Context ,  nif:RFC5147String ;
+        nif:anchorOf
+            "The Apache Stanbol Enhancer can detect entities in text."@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "56"^^xsd:int ;
+        nif:sourceUrl
+            <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
+
+Next the segment describing the only sentence in the example text.
+
+    :::text
+    content:char=0,56
+        a nif:RFC5147String ,  nif:Sentence ;
+        nif:anchorOf
+            "The Apache Stanbol Enhancer can detect entities in text."@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "56"^^xsd:int ;
+        nif:firstWord
+            content:char=0,3 ;
+        nif:referenceContext
+            content:char=0 .
+
+The following snippet shows the segments for the first three words of the 
Sentence.
+
+    :::text
+    content:char=0,3
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "The"@en ;
+        nif:beginIndex
+            "0"^^xsd:int ;
+        nif:endIndex
+            "3"^^xsd:int ;
+        nif:nextWord
+            content:char=4,10 ;
+        nif:oliaCategory
+             olia:Determiner ,  olia:PronounOrDeterminer ;
+        nif:oliaConf
+            "0.9662179110607207"^^xsd:double ;
+        nif:posTag
+            "DT"^^xsd:string ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=0,10 .
+
+    content:char=4,10
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "Apache"@en ;
+        nif:beginIndex
+            "4"^^xsd:int ;
+        nif:endIndex
+            "10"^^xsd:int ;
+        nif:nextWord
+            content:char=11,18 ;
+        nif:oliaCategory
+             olia:Noun ,  olia:PluralQuantifier ,  olia:ProperNoun ,  
olia:Quantifier ;
+        nif:oliaConf
+            "0.7882547205652428"^^xsd:double ;
+        nif:posTag
+            "NNPS"^^xsd:string ;
+        nif:previousWord
+            content:char=0,3 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=0,10 .
+
+    content:char=11,18
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "Stanbol"@en ;
+        nif:beginIndex
+            "11"^^xsd:int ;
+        nif:endIndex
+            "18"^^xsd:int ;
+        nif:nextWord
+            content:char=19,27 ;
+        nif:oliaCategory
+             olia:Noun ,  olia:ProperNoun ,  olia:Quantifier ,  
olia:SingularQuantifier ;
+        nif:oliaConf
+            "0.701014272348203"^^xsd:double ;
+        nif:posTag
+            "NNP"^^xsd:string ;
+        nif:previousWord
+            content:char=4,10 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=11,27 .
+
+Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the 
included the segment for the verb that links to the phrase using 
`nif:subString`.
+
+    :::text
+    content:char=28,38
+        a nif:Phrase ,  nif:RFC5147String ;
+        nif:anchorOf
+            "can detect"@en ;
+        nif:beginIndex
+            "28"^^xsd:int ;
+        nif:endIndex
+            "38"^^xsd:int ;
+        nif:oliaCategory
+             olia:VerbPhrase ;
+        nif:oliaConf
+            "0.9864510669287669"^^xsd:double ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:superString
+            content:char=0,56 .
+
+    content:char=32,38
+        a nif:RFC5147String ,  nif:Word ;
+        nif:anchorOf
+            "detect"@en ;
+        nif:beginIndex
+            "32"^^xsd:int ;
+        nif:endIndex
+            "38"^^xsd:int ;
+        nif:nextWord
+            content:char=39,47 ;
+        nif:oliaCategory
+             olia:Infinitive ,  olia:Verb ;
+        nif:oliaConf
+            "0.9930989756397197"^^xsd:double ;
+        nif:posTag
+            "VB"^^xsd:string ;
+        nif:previousWord
+            content:char=28,31 ;
+        nif:referenceContext
+            content:char=0 ;
+        nif:sentence
+            content:char=0,56 ;
+        nif:subString
+            content:char=28,38 .

Added: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png?rev=1634568&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream


Reply via email to