Hi all,

The task to define a new Enhancement Structure for apache Stanbol is
long outstanding (see STANBOL-351 [2]). In the past years several
discussion started but none of them where coming even to the state of
providing a good model.

In recent times thanks to the support of the Research Project Fusepool
P3 [1] I was able to spent time on this task and this this mail I
would like to present the current state of this effort to the
community.

- - -

The Fusepool Annotation Model
-------------------------------------------

The Fusepool Annotation Model (FAM) is based on Open Annotation [3][4]
and uses NIF 2.0 [5][6] for Selectors and lower level NLP annotations.
Summaries about Open Annotation and NIF are available at [4] and [6].

The FAM is build up by two main parts:

1. The "Annotation Core" [7]: This defines the core annotation model
and is based on Open Annotation and NIF.
2. Several "Annotation Bodies" for different annotation types. Those
bodies include annotations for
    * Content Language [8]: Annotation used to annotate the language
of the content
    * Entity Mentions [9]: Annotation for describing Named Entities
detected in the parsed text
    * Entity Annotation [10]: Used to link Entities with the analyses Content
    * Linked Entity [11]: Combines an Entity Mention and an Entity
Annotation. Used to link a mention of the Entity with a single Entity
of a Vocabulary (e.g. after disambiguation)
    * Entity Linking Choice and Entity Suggestion [12]: Used to
suggest several possible Entities for a Entity Mention.
    * Topic Classification and Topic Annotation [13]: Used to classify
a content with several Topics of a classification scheme.

With those predefined Annotation Bodies one can describe everything
that is currently support by FISE. So the new Model has 100% coverage
of the enhancement structure currently use by Apache Stanbol.


Migration options from FISE to FAM
------------------------------------------------

An easy migration from FISE to the FAM model was in important
requirement. To avoid the need of adapting all existing Stanbol
Engines to use the new model the decision was to define the FAM in a
way that one can define transformation rules from FISE to FAM [14].

Having such rules makes it possible to implement a
"Fise2FamTransformationEngine" that if added to the end of an
Enhancement Chain will allow users to receive Enhancement Results
based on the FAM model.

I will implement such an Engine in the 2nd half of August. This engine
will be compatible both with the 0.12.* and 1.0.0 versions of Apache
Stanbol.

As part of this Effort I will also update the Nlp2RdfEngine [15] to
support NIF 2.0. As FAM use NIF selectors having such an engine is
much more relevant as now NLP annotations serialized using NIF 2.0
will be automatically merged with Selectors used by high level FAM
annotations.

Next Steps:
----------------

As part of my work on Fusepool I will implement the
Fise2FamTransformationEngine and update the Nlp2RdfEngine before the
end of September. Both engines will be Open Source and Apache
Licensed. Meaning that by end of September all current Stanbol users
will be able to play around with the new Annotation Model.

IMHO it would really make sense to deprecate the current FISE Model
and migrate to an Model based on Open Annotation and NIF. I am
confident that FAM is a good starting point in that direction.

>From FISE to a Stanbol Annotation Model
--------------------------------------------------------

A possible path to migrate to a new Model could look like follows:

* The Stanbol Community has a look at the FAM and tests it against
current use cases as soon as the Fise2FamTransformationEngine is
available.
* Based on results of that process we can refine the FAM model and
make it to the preferred Enhancement Model. By that we should also
change its namespace to use "http://stanbol.apache.org/ongoloty/";
* For Stanbol 0.12.* and 1.0.0 we will support the new model by
providing a transformation engine For Stanbol 2.0.0 we would change
all engines to natively support the new model.

WDYT
Rupert Westenthaler

[1] http://p3.fusepool.eu/
[2] https://issues.apache.org/jira/browse/STANBOL-351
[3] http://www.openannotation.org/spec/core/
[4] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/openannotation.md
[5] http://persistence.uni-leipzig.org/nlp2rdf/
[6] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/nif.md
[7] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#annotation-core
[8] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#language-annotation
[9] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-mention-annotation
[10] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-annotation
[11] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#linked-entity-annotation
[12] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-linking-choice-annotation
[13] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#topic-classification
[14] 
https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#transformation-of-fise-to-the-fusepool-annotation-model
[15] http://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/nlp2rdf/

-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to