Hi Sebastian, On Thu, May 23, 2013 at 5:38 PM, Sebastian Hellmann <hellm...@informatik.uni-leipzig.de> wrote: > Hello Reto, > ah yes, I read up on some of the Apache procedures to see how you are > working and I see now that the mailing list is the most important means of > communication here. Thanks for pointing me to the issue. In principle, do > you want me to comment here or in the issue tracker?
I favor to start this discussion here on the list and only move it over to Jira as soon as there is some general agreement on how to proceed. > > Regarding the different models: > > First of all alignment should happen with the "Open Annotation Data Model": > http://www.w3.org/ns/oa# > This is the most current version. Annotation Ontology was merged into it. > I read through the current state of the Open Annotation Data Model and my fist impression is that adopting it would increase the complexity of annotations created by Apache Stanbol. Notes and Questions: (1) All fise:Enhancement would be oa:Annotation and all fise:{Name}Annotation (e.g. fise:TextAnnotation, fise:EntityAnnotation, fise:TopicAnnotation) would need to be separated to an own resource linked by the ao:hasBody property to the oa:Annotation. IMO such a separation of fise:Enhancement and fise:{Name}Annotation would just add complexity without bringing any advantage. So I would prefer to just use the oa:Annotation part and completely ignore the ao:hasBody side. While this is in principle in line with the standard (as Annotations without body are supported) this is for sure not in line with the intensions of the standard. (2) How would be Enhancements that depend/relate to other Enhancements be represented? The typical case is an fise:EntityAnnotation that suggests (dc:relation) an entity for a fise:TextAnnotation? I have not found an example like that in the [1]. (3) oa:SpecificResources could be used to explicitly model Blobs in an ContentItem. The 'source' would represent the parsed content (e.g. an PDF), the oa:SpecificResources would represent the extracted plain text Blob. ao:Selectors would clearly state that they are relative to the text/plain Blob and not the originally parsed PDF document. On the downside this model would introduce a lot of indirections for users that are only interested e.g. in the fise:selected-text (oa:exact of the oa:TextQuoteSelector) of a fise:TextAnnotation. * The use of W3C Media Fragments for ao:Selectors would be a create addition to Apache Stanbol. However this would require to separate selection specific properties (e.g. fise:start, fise:end, fise:selected-text) form the fise:TextAnnotation to an own resource. The reason for that with W3C Media Fragments all resources selecting the same part of the text would use the same URI. However currently it is possible that there are multiple fise:TextAnnotations selecting the same section of the text. However note that the Open Annotation Data Model defines two indirections between the annotation and the selected part of the content( from oa:Annotation -- oa:hasTarget --> ao:SpecificResource -- oa:hasSelector --> oa:TextQuoteSelector ). IMO Stanbol should define a direct relation (shortcut property) between annotation and selector. (4) I have not found anything related to the confidence of Annotations. I think this is because the current model as a focus on manual annotations. IMO the missing concept of a confidence is also the reason for the issue stated in relation to ordering in the "Multiplicity Constructs" section of the Open Annotation Data Model (5) Multiplicity Constructs: This section reveals an additional difference in the semantics between the Stanbol Enahncement Structure and the Open Annotation Data Model: While the OA notes "The semantics defined in the Core for multiple Bodies and Targets are that each resource is related to the others individually" in Stanbol multiple relations from a fise:TextAnnotation to fise:EntityAnnotations are NOT considered as individual, but as multiple suggestions for the same context. Because of this in Stanbol there in no need to have multiplicity constructs! Based on those observations my fist impression is that a full adaption of the Open Annotation Data Model would require a complete re-thinking of how annotations are composed and result in a complete rewrite of everything in Stanbol that is related to RDF. IMO the resulting RDF would be also much harder to consume and produce and therefore affect both users that need to extract informations form the enhancement results as well as programmers that want to implement their Enhancement Engine. However as I noted in the beginning those observations are based on a first look at the Open Annotation Data Model. So I might as well have missed a much better alignment of the Stanbol Enhancement Structure. best Rupert [1] http://www.openannotation.org/spec/core/ > I really recommend grounding any work on their model, as it is really good > and powerful. I am not sure however, whether, it provides the right level of > scalability for NLP. > Looking at: > http://de.slideshare.net/paolociccarese/open-annotation-specifiers-and-specific-resources-tutorial > There are 3 important things missing: > - inclusion of the actual text in the web service request > - providing best practices for identifiers, e.g. > http://purl.org/olia/penn.owl#DT > - reducing the number of URNs and triples > > This is where NIF comes in. (If you are in doubt, please try to create an OA > example where a simple sentence is POS annotated over a web service). > > Regarding Ruperts problem with backward compatibility. > In a first step, it should be enough to build an RDF parser/serializer based > on the new OWL file. > > I didn't yet understand, what is meant exactly by "Stanbol Enhancement > Structure"[1]. > Is this the OWL file for serializing annotations (e.g. for use in SPARQL) or > does it describe the internal structure of the Stanbol Java Framework? > > I think the second one can stay as it is for now and then the new structure > should be created (as serialization format) meanwhile with the clear aim to > replace the former in the future. This would give all clients enough time to > adapt. > > What do you think? > > All the best, > Sebastian > > [1] > http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html > > > > Am 23.05.2013 14:12, schrieb Reto Bachmann-Gmür: > >> Hi Sebastian >> >> Are you aware of https://issues.apache.org/jira/browse/STANBOL-351? >> >> Rtaher than doing telcos we should discus things on the list. >> >> Cheers, >> Reto >> >> >> On Thu, May 23, 2013 at 9:27 AM, Sebastian Hellmann < >> hellm...@informatik.uni-leipzig.de> wrote: >> >>> Hi all, >>> we created an OWL schema called NLP Interchange Format(NIF), which >>> leverages Apache Stanbols FISE ontology. >>> Recent documentation is here: >>> >>> http://svn.aksw.org/papers/**2013/ISWC_NIF/public.pdf<http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf> >>> >>> Personally, I think the general structure (using URN for each annotation) >>> is quite good, but I am a little bit unhappy with some facts: >>> 1. URL persistence: when will the FISE ontology move from IKS to the >>> Apache Stanbol namespace. In my opinion, sooner is better. The longer it >>> is >>> out there, the more side effects it will cause: >>> http://xkcd.com/1172/ >>> 2. Some issues need discussions and some streamlining. I would be happy >>> to >>> be of assistance and would offer to hold some Ontology telcos to get it >>> straight. >>> http://svn.apache.org/repos/**asf/stanbol/trunk/enhancer/** >>> >>> generic/servicesapi/src/main/**resources/fise.owl<http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/servicesapi/src/main/resources/fise.owl> >>> e.g. >>> - start and end have xsd:int limiting it to a 4GB text file >>> - extracted-from might not need to be functional. Also there might be a >>> relation to prov:wasDerivedFrom >>> These issues all need discussion however. >>> >>> Any ideas on how to proceed? >>> >>> All the best, >>> Sebastian >>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Events: NLP & DBpedia 2013 >>> (http://nlp-dbpedia2013.blogs.**aksw.org<http://nlp-dbpedia2013.blogs.aksw.org>, >>> Deadline: *July 8th*) >>> Venha para a Alemanha como PhD: >>> http://bis.informatik.uni-**leipzig.de/csf<http://bis.informatik.uni-leipzig.de/csf> >>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >>> http://dbpedia.org/Wiktionary , http://dbpedia.org >>> Homepage: >>> http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> >>> Research Group: http://aksw.org >>> > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline: > *July 8th*) > Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf > > Projects: http://nlp2rdf.org , http://linguistics.okfn.org , > http://dbpedia.org/Wiktionary , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen