Re: NIF + Stanbol

Rupert Westenthaler Sat, 25 May 2013 00:23:58 -0700

Hi Sebastian,

On Thu, May 23, 2013 at 5:38 PM, Sebastian Hellmann
<hellm...@informatik.uni-leipzig.de> wrote:
> Hello Reto,
> ah yes, I read up on some of the Apache procedures to see how you are
> working and I see now that the mailing list is the most important means of
> communication here.   Thanks for pointing me to the issue. In principle, do
> you want me to comment here or in the issue tracker?

I favor to start this discussion here on the list and only move it
over to Jira as soon as there is some general agreement on how to
proceed.

>
> Regarding the different models:
>
> First of all alignment should happen with the "Open Annotation Data Model":
> http://www.w3.org/ns/oa#
> This is the most current version. Annotation Ontology was merged into it.
>

I read through the current state of the Open Annotation Data Model and
my fist impression is that adopting it would increase the complexity
of annotations created by Apache Stanbol.

Notes and Questions:

(1) All fise:Enhancement would be oa:Annotation and all
fise:{Name}Annotation (e.g. fise:TextAnnotation,
fise:EntityAnnotation, fise:TopicAnnotation) would need to be
separated to an own resource linked by the ao:hasBody property to the
oa:Annotation. IMO such a separation of fise:Enhancement and
fise:{Name}Annotation would just add complexity without bringing any
advantage. So I would prefer to just use the oa:Annotation part and
completely ignore the ao:hasBody side. While this is in principle in
line with the standard (as Annotations without body are supported)
this is for sure not in line with the intensions of the standard.

(2) How would be Enhancements that depend/relate to other Enhancements
be represented? The typical case is an fise:EntityAnnotation that
suggests (dc:relation) an entity for a fise:TextAnnotation? I have not
found an example like that in the [1].

(3) oa:SpecificResources could be used to explicitly model Blobs in an
ContentItem. The 'source' would represent the parsed content (e.g. an
PDF), the oa:SpecificResources would represent the extracted plain
text Blob. ao:Selectors would clearly state that they are relative to
the text/plain Blob and not the originally parsed PDF document. On the
downside this model would introduce a lot of indirections for users
that are only interested e.g. in the fise:selected-text (oa:exact of
the oa:TextQuoteSelector) of a fise:TextAnnotation.

    * The use of W3C Media Fragments for ao:Selectors would be a
create addition to Apache Stanbol. However this would require to
separate selection specific properties (e.g. fise:start, fise:end,
fise:selected-text) form the fise:TextAnnotation to an own resource.
The reason for that with W3C Media Fragments all resources selecting
the same part of the text would use the same URI. However currently it
is possible that there are multiple fise:TextAnnotations selecting the
same section of the text. However note that the Open Annotation Data
Model defines two indirections between the annotation and the selected
part of the content( from oa:Annotation -- oa:hasTarget -->
ao:SpecificResource -- oa:hasSelector --> oa:TextQuoteSelector ). IMO
Stanbol should define a direct relation (shortcut property) between
annotation and selector.

(4) I have not found anything related to the confidence of
Annotations. I think this is because the current model as a focus on
manual annotations. IMO the missing concept of a confidence is also
the reason for the issue stated in relation to ordering in the
"Multiplicity Constructs" section of the Open Annotation Data Model

(5) Multiplicity Constructs: This section reveals an additional
difference in the semantics between the Stanbol Enahncement Structure
and the  Open Annotation Data Model: While the OA notes "The semantics
defined in the Core for multiple Bodies and Targets are that each
resource is related to the others individually" in Stanbol multiple
relations from a fise:TextAnnotation to fise:EntityAnnotations are NOT
considered as individual, but as multiple suggestions for the same
context. Because of this in Stanbol there in no need to have
multiplicity constructs!

Based on those observations my fist impression is that a full adaption
of the Open Annotation Data Model would require a complete re-thinking
of how annotations are composed and result in a complete rewrite of
everything in Stanbol that is related to RDF. IMO the resulting RDF
would be also much harder to consume and produce and therefore affect
both users that need to extract informations form the enhancement
results as well as programmers that want to implement their
Enhancement Engine.

However as I noted in the beginning those observations are based on a
first look at the Open Annotation Data Model. So I might as well have
missed a much better alignment of the Stanbol Enhancement Structure.

best
Rupert

[1] http://www.openannotation.org/spec/core/

> I really recommend grounding any work on their model, as it is really good
> and powerful. I am not sure however, whether, it provides the right level of
> scalability for NLP.
> Looking at:
> http://de.slideshare.net/paolociccarese/open-annotation-specifiers-and-specific-resources-tutorial
> There are 3 important things missing:
> - inclusion of the actual text in the web service request
> - providing best practices for identifiers, e.g.
> http://purl.org/olia/penn.owl#DT
> - reducing the number of URNs and triples
>
> This is where NIF comes in. (If you are in doubt, please try to create an OA
> example where a simple sentence is POS annotated over a web service).
>
> Regarding Ruperts problem with backward compatibility.
> In a first step, it should be enough to build an RDF parser/serializer based
> on the new OWL file.
>
> I didn't yet understand, what is meant exactly by "Stanbol Enhancement
> Structure"[1].
> Is this the OWL file for serializing annotations (e.g. for use in SPARQL) or
> does it describe the internal structure of the Stanbol Java Framework?
>
> I think the second one can stay as it is for now and then the new structure
> should be created (as serialization format) meanwhile with the clear aim to
> replace the former in the future. This would give all clients enough time to
> adapt.
>
> What do you think?
>
> All the best,
> Sebastian
>
> [1]
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
>
>
>
> Am 23.05.2013 14:12, schrieb Reto Bachmann-Gmür:
>
>> Hi Sebastian
>>
>> Are you aware of https://issues.apache.org/jira/browse/STANBOL-351?
>>
>> Rtaher than doing telcos we should discus things on the list.
>>
>> Cheers,
>> Reto
>>
>>
>> On Thu, May 23, 2013 at 9:27 AM, Sebastian Hellmann <
>> hellm...@informatik.uni-leipzig.de> wrote:
>>
>>> Hi all,
>>> we created an OWL schema called NLP Interchange Format(NIF), which
>>> leverages Apache Stanbols FISE ontology.
>>> Recent documentation is here:
>>>
>>> http://svn.aksw.org/papers/**2013/ISWC_NIF/public.pdf<http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf>
>>>
>>> Personally, I think the general structure (using URN for each annotation)
>>> is quite good, but I am a little bit unhappy with some facts:
>>> 1. URL persistence: when will the FISE ontology move from IKS to the
>>> Apache Stanbol namespace. In my opinion, sooner is better. The longer it
>>> is
>>> out there, the more side effects it will cause:
>>> http://xkcd.com/1172/
>>> 2. Some issues need discussions and some streamlining. I would be happy
>>> to
>>> be of assistance and would offer to hold some Ontology telcos to get it
>>> straight.
>>> http://svn.apache.org/repos/**asf/stanbol/trunk/enhancer/**
>>>
>>> generic/servicesapi/src/main/**resources/fise.owl<http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/servicesapi/src/main/resources/fise.owl>
>>> e.g.
>>> - start and end have xsd:int limiting it to a 4GB text file
>>> - extracted-from might not need to be functional. Also there might be a
>>> relation to prov:wasDerivedFrom
>>> These issues all need discussion however.
>>>
>>> Any ideas on how to proceed?
>>>
>>> All the best,
>>> Sebastian
>>>
>>> --
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Events: NLP & DBpedia 2013
>>> (http://nlp-dbpedia2013.blogs.**aksw.org<http://nlp-dbpedia2013.blogs.aksw.org>,
>>> Deadline: *July 8th*)
>>> Venha para a Alemanha como PhD:
>>> http://bis.informatik.uni-**leipzig.de/csf<http://bis.informatik.uni-leipzig.de/csf>
>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>> Homepage:
>>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>>> Research Group: http://aksw.org
>>>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline:
> *July 8th*)
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
>
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org

--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: NIF + Stanbol

Reply via email to