Author: rwesten
Date: Wed Jan 30 13:41:52 2013
New Revision: 1440412
URL: http://svn.apache.org/viewvc?rev=1440412&view=rev
Log:
moved informations from the customnermodelengine to opennlpcustomner
Removed:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/customnermodelengine.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpcustomner.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1440412&r1=1440411&r2=1440412&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Wed Jan 30 13:41:52 2013
@@ -84,7 +84,7 @@ NER engines need to write detected Named
* detects occurrences of persons, places and organizations only
* supports [NER
annotations](../nlp/nlpannotations#name-entity-ner-annotations)
-* __[Custom NER Model Extraction Enhancement
Engine](customnermodelengine.html):__
+* __[OpenNLP Custom NER Model Engine](opennlpcustomner):__
* NLP processing using OpenNLP NER
* uses custom NameFinder models (user configured)
* supports custom Named Entity types (other than persons, places and
organizations
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpcustomner.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpcustomner.mdtext?rev=1440412&r1=1440411&r2=1440412&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpcustomner.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpcustomner.mdtext
Wed Jan 30 13:41:52 2013
@@ -0,0 +1,53 @@
+Title: The OpenNLP Custom NER Model Extraction Engine
+
+This engine allows the configuration of custom [Apache
OpenNLP](http://opennlp.apache.org) NameFinder models for NER of plain text
content.
+
+
+## Example Result
+
+This engine adds
[fise:TextAnnotation](../enhancementstructure.html#fisetextannotation) for the
processed plain text to the metadata of the content item. The following code
listing shows an DNA type Named Entity detected based on a OpenNLP NameFinder
model trained based on the
[BioNLP2004](http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html) dataset:
+
+ :::json
+ {
+ "@subject": "urn:enhancement-0e31eb01-23c5-82b5-1372-5c5606c09960",
+ "@type": [
+ "Enhancement",
+ "TextAnnotation"
+ ],
+ "confidence": 0.40148407,
+ "creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine",
+ "start": 228,
+ "end": 242,
+ "extracted-from":
"urn:content-item-sha1-84a30aeeb073be543f7c54266e232aae572efac0",
+ "selected-text": {
+ "@language": "en",
+ "@literal": "HIV-2 enhancer"
+ },
+ "selection-context": {
+ "@language": "en",
+ "@literal": "activation of the HIV-2 enhancer in monocytes and T
cells"
+ },
+ "type": "http://www.bootstrep.eu/ontology/GRO#DNA"
+ },
+
+## Configuration
+
+The usage of this Engine requires to create a service configuration.
Configurations require at least a single NameFinderModel name to be configured.
+
+### Parameters
+
+* __Name Finder Models__ _(stanbol.engines.opennlp-ner.nameFinderModels)_: The
list if custom NameFinderModels used by this engine. The Engine supports
Arrays, Vectors and comma separated string for. Values are the file names of
the NameFinderModel files. Configured files are loaded by using the
DataFileProvider service. That means that files copied into the 'datafile'
folder (by default located at '{stanbol-working-dir}/stanbol/datafiles').
+* __Named Entity to 'dc:type' Mappings__
_(stanbol.engines.opennlp-ner.typeMappings)_: This configuration uses the
syntax {named-entity-type} > {uri}": {named-entity-type} matches to the string
"name" used for the named entity type in the OpenNLP NameFinder model. {uri}
MUST BE a valid URI and is used as dc:type value for fise:TextAnnotations
created by the engine for extracted Named Entities. NOTE: that TextAnnotations
for unmapped Named Entity Types will have no dc:type information.
+
+The following figure provides a visual representation of an engine
configuration configured for all NamedEntity types supported by the
[BioNLP2004](http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html) dataset.
+
+
+
+The same configuration can be also provided as OSGI configuration file with
the name
'org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine-ehealthner.config'
and the contents:
+
+ :::text
+ stanbol.enhancer.engine.name="ehealth-ner"
+
stanbol.engines.opennlp-ner.nameFinderModels=["bionlp2004-DNA-en.bin","bionlp2004-protein-en.bin","bionlp2004-cell_type-en.bin","bionlp2004-cell_line-en.bin","bionlp2004-RNA-en.bin"]
+ stanbol.engines.opennlp-ner.typeMappings=["DNA\ >\
http://www.bootstrep.eu/ontology/GRO#DNA","RNA\ >\
http://www.bootstrep.eu/ontology/GRO#RNA","protein\ >\
http://www.bootstrep.eu/ontology/GRO#Protein","cell_type\ >\
http://purl.bioontology.org/ontology/CL","cell_line\ >\
http://purl.bioontology.org/ontology/MCCL"]
+
+NOTE: that the '.config' format requires spaces to be escaped with '\'