Author: rwesten
Date: Mon Oct 27 15:19:11 2014
New Revision: 1634568
URL: http://svn.apache.org/r1634568
Log:
added Enhancement Engine Documentation for STANBOL-1397
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
(with props)
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1634568&r1=1634567&r2=1634568&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Mon Oct 27 15:19:11 2014
@@ -213,9 +213,9 @@ Apache Stanbol provide a core implementa
### Others
-* _NLP 2 RDF Engine:_ __under development__ (see
[STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
- * converts NLP processing results stored in the
[AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the
metadata of the [ContentItem](../contentitem)
- * generated RDF uses the NIF (NLP Interchange Format)
+* __[NIF 2.0 Transformation Engine](nif20)__ allows to serialize low level NLP
results as RDF
+ * [NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) stands for NLP
Interchange Format. It defines an RDF schema that allows to describe Sentences,
Phrases, Words and its NLP annotation.
+ * This engines allows to retrieve detailed information about NLP results
typically only available by the Java API of the [Analysed
Text](../nlp/analyzedtext) content part.
## Deprecated
@@ -227,6 +227,13 @@ Enhancement Engines listed below are no
* supports multiple languages
* detects occurrences of untyped entities as concepts, takes local
taxonomies as linking target
+* _NLP 2 RDF Engine:_ __under development__ (see
[STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
+ * replaced by the __[NIF 2.0 Transformation Engine](nif20)__ that
supportes version 2.0 of the NIF standard while this engine is based on NIF 1.0
+ * converts NLP processing results stored in the
[AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the
metadata of the [ContentItem](../contentitem)
+ * generated RDF uses the NIF (NLP Interchange Format)
+
+
+
* _CachingDereferencerEngine_ __deprecated__ (see dereferencing support of
individual engines as well as
[STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
* retrieves additional content for presenting the enhancement results.
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext?rev=1634568&view=auto
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
(added)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20.mdtext
Mon Oct 27 15:19:11 2014
@@ -0,0 +1,193 @@
+Title: NIF 2.0 Transformation Engine
+
+Typically low level NLP results are not included to the RDF enhancement
results. This engine supports the serialization of such results by using the
[NIF 2.0](http://persistence.uni-leipzig.org/nlp2rdf/) (NLP Interchange Format)
standard.
+
+## Processed Information (Input)
+
+Apache Stanbol manages NLP results by the [Analysed Text](../nlp/analyzedtext)
content part. This ContentPart provides a Java API for accessing those results.
This engine reads such information and transformes it according to the [NIF
2.0](http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html)
core ontology.
+
+If a ContentItem does not contain this content part it will not be processed
by this engine.
+
+## Created RDF
+
+The engine serializes the following information:
+
+* Segment URIs by using the [RFC 5147](http://tools.ietf.org/html/rfc5147) URI
scheme
+* Selector information like `nif:beginIndex`, `nif:endIndex` as well as
`nif:before`, `nif:anchorOf` and `nif:after`. For spans longer as 100 chars the
`nif:head` property is used instead of `nif:anchorOf`.
+* Context information: This includes `nif:referenceContext` links for all
Strings as well as additional metadata for the context.
+* String hierarchies: `nif:sub-/nif:superWord`, `nif:sentence`
+* String navigation: `nif:next-/nif:previousSentnece`,
`nif:next-/nif:previousWord`
+* String annotations: `nif:oliaCategory`, `nif:oliaConfidence` and `nif:posTag`
+
+### Configuration
+
+The Engine supports several switches that allow to enable/disable the
serialization of NIF information. The engine supports the configuration of
multiple instances with different configurations. The following figure shows
the configuration dialog:
+
+
+
+* __Selector__ _(enhancer.engines.nlp2rdf.selector)_: Allows to enable/disable
the serialization of selector related properties such as `nif:beginIndex`,
`nif:endIndex`, `nif:before`, `nif:anchorOf` and `nif:after`. If disabled
clients can still parse the start/end indexes from the [RFC
5147](http://tools.ietf.org/html/rfc5147) encoded segment URI.
+* __Hierarchy__ _(enhancer.engines.nlp2rdf.hierarchy)_: Switch that allows to
enable/disable writing of hierarchical links. This includes `olia:sentence`,
`olia:superString` and `olia:subString` properties.
+* __Previous and Next Links__ _(enhancer.engines.nlp2rdf.previousNext)_:
Allows to enable/disable the serialization of links to the previous/next
sentence/word
+* __Context only URI Scheme__
_(enhancer.engines.nlp2rdf.cotextOnlyUriScheme)_: If enabled the used [RFC
5147](http://tools.ietf.org/html/rfc5147) URI scheme is added only to the
`rdf:type` of the `nif:Context`. If disabled the `nif:RFC5147String` `rdf:type`
is added to all segments.
+* __String Type__ _(enhancer.engines.nlp2rdf.writeStringType)_: If enabled the
`nif:String` type is added to all serialized segments. If disabled only more
specific types like `nif:Sentence` or `nif:Word` are used.
+
+### Examples
+
+This sections provides some examples of RDF generated by this Engine. OpenNLP
was used to create the serialized NLP annotation. The Sentence `The Apache
Stanbol Enhancer can detect entities in text` was used for generating this
example.
+
+ :::text
+ @prefix content
<urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e#> .
+ @prefix nif
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
+ @prefix olia <http://purl.org/olia/olia.owl#> .
+ @prefix xsd <http://www.w3.org/2001/XMLSchema#> .
+
+The first Turtle snippet shows the `nif:Context` instance. This is referenced
by all segments and it will refer to the URI of the ContentItem by using the
`nif:sourceUrl`.
+
+ :::text
+ content:char=0
+ a nif:Context , nif:RFC5147String ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:sourceUrl
+ <urn:content-item-sha1-be57a50b7f82854460c2ff33a65637e36befe48e> .
+
+Next the segment describing the only sentence in the example text.
+
+ :::text
+ content:char=0,56
+ a nif:RFC5147String , nif:Sentence ;
+ nif:anchorOf
+ "The Apache Stanbol Enhancer can detect entities in text."@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "56"^^xsd:int ;
+ nif:firstWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 .
+
+The following snippet shows the segments for the first three words of the
Sentence.
+
+ :::text
+ content:char=0,3
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "The"@en ;
+ nif:beginIndex
+ "0"^^xsd:int ;
+ nif:endIndex
+ "3"^^xsd:int ;
+ nif:nextWord
+ content:char=4,10 ;
+ nif:oliaCategory
+ olia:Determiner , olia:PronounOrDeterminer ;
+ nif:oliaConf
+ "0.9662179110607207"^^xsd:double ;
+ nif:posTag
+ "DT"^^xsd:string ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+ content:char=4,10
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Apache"@en ;
+ nif:beginIndex
+ "4"^^xsd:int ;
+ nif:endIndex
+ "10"^^xsd:int ;
+ nif:nextWord
+ content:char=11,18 ;
+ nif:oliaCategory
+ olia:Noun , olia:PluralQuantifier , olia:ProperNoun ,
olia:Quantifier ;
+ nif:oliaConf
+ "0.7882547205652428"^^xsd:double ;
+ nif:posTag
+ "NNPS"^^xsd:string ;
+ nif:previousWord
+ content:char=0,3 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=0,10 .
+
+ content:char=11,18
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "Stanbol"@en ;
+ nif:beginIndex
+ "11"^^xsd:int ;
+ nif:endIndex
+ "18"^^xsd:int ;
+ nif:nextWord
+ content:char=19,27 ;
+ nif:oliaCategory
+ olia:Noun , olia:ProperNoun , olia:Quantifier ,
olia:SingularQuantifier ;
+ nif:oliaConf
+ "0.701014272348203"^^xsd:double ;
+ nif:posTag
+ "NNP"^^xsd:string ;
+ nif:previousWord
+ content:char=4,10 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=11,27 .
+
+Also Phrases are exported as RDF. Here an example for an Verb Phrase. Also the
included the segment for the verb that links to the phrase using
`nif:subString`.
+
+ :::text
+ content:char=28,38
+ a nif:Phrase , nif:RFC5147String ;
+ nif:anchorOf
+ "can detect"@en ;
+ nif:beginIndex
+ "28"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:oliaCategory
+ olia:VerbPhrase ;
+ nif:oliaConf
+ "0.9864510669287669"^^xsd:double ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:superString
+ content:char=0,56 .
+
+ content:char=32,38
+ a nif:RFC5147String , nif:Word ;
+ nif:anchorOf
+ "detect"@en ;
+ nif:beginIndex
+ "32"^^xsd:int ;
+ nif:endIndex
+ "38"^^xsd:int ;
+ nif:nextWord
+ content:char=39,47 ;
+ nif:oliaCategory
+ olia:Infinitive , olia:Verb ;
+ nif:oliaConf
+ "0.9930989756397197"^^xsd:double ;
+ nif:posTag
+ "VB"^^xsd:string ;
+ nif:previousWord
+ content:char=28,31 ;
+ nif:referenceContext
+ content:char=0 ;
+ nif:sentence
+ content:char=0,56 ;
+ nif:subString
+ content:char=28,38 .
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png?rev=1634568&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/nif20config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream