Hi all, The task to define a new Enhancement Structure for apache Stanbol is long outstanding (see STANBOL-351 [2]). In the past years several discussion started but none of them where coming even to the state of providing a good model.
In recent times thanks to the support of the Research Project Fusepool P3 [1] I was able to spent time on this task and this this mail I would like to present the current state of this effort to the community. - - - The Fusepool Annotation Model ------------------------------------------- The Fusepool Annotation Model (FAM) is based on Open Annotation [3][4] and uses NIF 2.0 [5][6] for Selectors and lower level NLP annotations. Summaries about Open Annotation and NIF are available at [4] and [6]. The FAM is build up by two main parts: 1. The "Annotation Core" [7]: This defines the core annotation model and is based on Open Annotation and NIF. 2. Several "Annotation Bodies" for different annotation types. Those bodies include annotations for * Content Language [8]: Annotation used to annotate the language of the content * Entity Mentions [9]: Annotation for describing Named Entities detected in the parsed text * Entity Annotation [10]: Used to link Entities with the analyses Content * Linked Entity [11]: Combines an Entity Mention and an Entity Annotation. Used to link a mention of the Entity with a single Entity of a Vocabulary (e.g. after disambiguation) * Entity Linking Choice and Entity Suggestion [12]: Used to suggest several possible Entities for a Entity Mention. * Topic Classification and Topic Annotation [13]: Used to classify a content with several Topics of a classification scheme. With those predefined Annotation Bodies one can describe everything that is currently support by FISE. So the new Model has 100% coverage of the enhancement structure currently use by Apache Stanbol. Migration options from FISE to FAM ------------------------------------------------ An easy migration from FISE to the FAM model was in important requirement. To avoid the need of adapting all existing Stanbol Engines to use the new model the decision was to define the FAM in a way that one can define transformation rules from FISE to FAM [14]. Having such rules makes it possible to implement a "Fise2FamTransformationEngine" that if added to the end of an Enhancement Chain will allow users to receive Enhancement Results based on the FAM model. I will implement such an Engine in the 2nd half of August. This engine will be compatible both with the 0.12.* and 1.0.0 versions of Apache Stanbol. As part of this Effort I will also update the Nlp2RdfEngine [15] to support NIF 2.0. As FAM use NIF selectors having such an engine is much more relevant as now NLP annotations serialized using NIF 2.0 will be automatically merged with Selectors used by high level FAM annotations. Next Steps: ---------------- As part of my work on Fusepool I will implement the Fise2FamTransformationEngine and update the Nlp2RdfEngine before the end of September. Both engines will be Open Source and Apache Licensed. Meaning that by end of September all current Stanbol users will be able to play around with the new Annotation Model. IMHO it would really make sense to deprecate the current FISE Model and migrate to an Model based on Open Annotation and NIF. I am confident that FAM is a good starting point in that direction. >From FISE to a Stanbol Annotation Model -------------------------------------------------------- A possible path to migrate to a new Model could look like follows: * The Stanbol Community has a look at the FAM and tests it against current use cases as soon as the Fise2FamTransformationEngine is available. * Based on results of that process we can refine the FAM model and make it to the preferred Enhancement Model. By that we should also change its namespace to use "http://stanbol.apache.org/ongoloty/" * For Stanbol 0.12.* and 1.0.0 we will support the new model by providing a transformation engine For Stanbol 2.0.0 we would change all engines to natively support the new model. WDYT Rupert Westenthaler [1] http://p3.fusepool.eu/ [2] https://issues.apache.org/jira/browse/STANBOL-351 [3] http://www.openannotation.org/spec/core/ [4] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/openannotation.md [5] http://persistence.uni-leipzig.org/nlp2rdf/ [6] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/nif.md [7] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#annotation-core [8] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#language-annotation [9] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-mention-annotation [10] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-annotation [11] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#linked-entity-annotation [12] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#entity-linking-choice-annotation [13] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#topic-classification [14] https://github.com/fusepoolP3/overall-architecture/blob/master/wp3/fp-anno-model/fp-anno-model.md#transformation-of-fise-to-the-fusepool-annotation-model [15] http://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/nlp2rdf/ -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/