[
https://issues.apache.org/jira/browse/TAVERNA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stian Soiland-Reyes updated TAVERNA-1044:
-----------------------------------------
Summary: Parsing COMBINE archive from JWSOnline skips metadata.rdf (was:
COMBIE parsing of JWSOnline skips metadata.rdf)
> Parsing COMBINE archive from JWSOnline skips metadata.rdf
> ---------------------------------------------------------
>
> Key: TAVERNA-1044
> URL: https://issues.apache.org/jira/browse/TAVERNA-1044
> Project: Apache Taverna
> Issue Type: Bug
> Components: Taverna Language
> Affects Versions: language 0.15.1
> Reporter: Stian Soiland-Reyes
> Assignee: Stian Soiland-Reyes
> Priority: Major
> Fix For: language 0.16.0
>
>
> When parsing a COMBINE archive from [JWS Online|http://jjj.mib.ac.uk/] such
> as
> http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1
> - then the metadata.rdf does not seem to be parsed.
> h2. Error trace
> {code}
> stain@biggie:/tmp$ curl -fO --remote-header-name
> 'http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1'
> curl: Saved to filename 'adlung2017_fig2f.sedx'
> stain@biggie:/tmp$ java -jar
> ~/software/taverna-tavlang-tool-0.15.1-incubating.jar convert --robundle
> adlung2017_fig2f.sedx
> ..
> May 10, 2018 10:35:43 AM
> org.apache.taverna.robundle.manifest.combine.CombineManifest findAnnotations
> WARNING: Can't parse /metadata.rdf
> org.apache.jena.riot.RiotException: [line: 6, col: 43] {E202} Expecting XML
> start or end element(s). String data "2018-05-10T02:38:51Z" not allowed.
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML
> content in RDF. Maybe a striping error.
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:128)
> at
> org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDFXML.java:246)
> at
> org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
> at
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:196)
> at
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:173)
> at
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:168)
> at
> org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:194)
> at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:55)
> at
> org.apache.jena.rdfxml.xmlinput.states.Frame.characters(Frame.java:164)
> at
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
> at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown
> Source)
> at org.apache.xerces.impl.XMLNamespaceBinder.characters(Unknown Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
> at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at
> org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:150)
> at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118)
> at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:142)
> at
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:175)
> at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:905)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:256)
> at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:242)
> at
> org.apache.taverna.robundle.manifest.combine.CombineManifest.parseRDF(CombineManifest.java:240)
> at
> org.apache.taverna.robundle.manifest.combine.CombineManifest.findAnnotations(CombineManifest.java:332)
> at
> org.apache.taverna.robundle.manifest.combine.CombineManifest.readCombineArchive(CombineManifest.java:465)
> at
> org.apache.taverna.robundle.Bundle.readOrPopulateManifest(Bundle.java:121)
> at org.apache.taverna.robundle.Bundle.getManifest(Bundle.java:87)
> at
> org.apache.taverna.tavlang.tools.convert.ToRobundle.convert(ToRobundle.java:60)
> at
> org.apache.taverna.tavlang.tools.convert.ToRobundle.<init>(ToRobundle.java:47)
> at
> org.apache.taverna.tavlang.CommandLineTool$CommandConvert.runcommand(CommandLineTool.java:226)
> at
> org.apache.taverna.tavlang.CommandLineTool$CommandConvert.execute(CommandLineTool.java:220)
> at
> org.apache.taverna.tavlang.CommandLineTool.parse(CommandLineTool.java:71)
> at
> org.apache.taverna.tavlang.TavernaCommandline.main(TavernaCommandline.java:26)
> {code}
> h2. Analysis
> This seems to be caused by invalid RDF/XML in the metadata.rdf added by JWS
> Online:
> {code:xml}
> stain@biggie:/tmp$ unzip adlung2017_fig2f.sedx
> stain@biggie:/tmp$ riot metadata.rdf
> 10:39:17 ERROR riot :: [line: 6, col: 43] {E202} Expecting
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed.
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML
> content in RDF. Maybe a striping error.
> 10:39:17 ERROR riot :: [line: 43, col: 43] {E202} Expecting
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed.
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML
> content in RDF. Maybe a striping error.
> 10:39:17 ERROR riot :: [line: 152, col: 43] {E202} Expecting
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed.
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML
> content in RDF. Maybe a striping error.
> ...
> <file:///tmp/> <http://purl.org/dc/terms/description> "Built by JWS Online." .
> _:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://purl.org/dc/terms/W3CDTF> .
> <file:///tmp/> <http://purl.org/dc/terms/created>
> _:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 .
> <file:///tmp/models/adlung1.sbml> <http://purl.org/dc/terms/description>
> "Exported by JWS Online from ..."
> {code}
> The broken RDF/XML follows this pattern:
> {code:xml}
> <rdf:Description rdf:about=".">
> <dcterms:description>Built by JWS Online.</dcterms:description>
> <dcterms:created>
> <dcterms:W3CDTF>2018-05-10T02:38:51Z</dcterms:W3CDTF>
> </dcterms:created>
> </rdf:Description>
> {code}
> As Jena points out, this is not valid RDF/XML, as here it says a property
> dcterms:createdto a new anonymous W3CDTF resource - but a resource can't
> directly wrap a literal. The literal needs then a new nested property like
> <rdf:value>.
> This is probably a confusion from
> http://identifiers.org/combine.specifications/omex.version-1 which in its
> example, for some reason, uses dcterms:W3CDTF as a property of an untyped
> anonymous resource under dcterms:created:
> {code:xml}
> <dcterms:created rdf:parseType="Resource">
> <dcterms:W3CDTF>2014-06-26T10:29:00Z</dcterms:W3CDTF>
> </dcterms:created>
> {code}
> This is semantically wrong as
> [dcterms:W3CDTF|http://dublincore.org/documents/dcmi-terms/#terms-W3CDTF] is
> defined as a Datatype (like int), not a Property. Similarly
> [dcterms:created|http://dublincore.org/documents/dcmi-terms/#terms-created]
> is defined with a range rdfs:Literal, which would not include a new W3CDTF
> Resource.
> I believe dcterms:W3CDTF is meant as a grouping of the XSD datatypes like
> [xsd:dateTime|https://www.w3.org/TR/xmlschema11-2/#dateTime] but is listed in
> DCTerms for pure XML users.
> dcterms:created is more commonly used with a typed RDF literal rather than
> through some kind of anonymous "timestamp" resource. So normal use (outside
> COMBINE) would be:
> {code:xml}
> <dcterms:created
> rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2014-06-26T10:29:00Z</dcterms:created>
> {code}
> Our [CombineManifest
> code|https://github.com/apache/incubator-taverna-language/blob/0.15.1-incubating/taverna-robundle/src/main/java/org/apache/taverna/robundle/manifest/combine/CombineManifest.java#L366]
> supports both variants as the {{parseType=Resource}} variant is commonly
> used by COMBINE producers.
> The example from JWS Online however is in-between - I have let the authors
> know and recommended they use rdf:value or rdf:datatype variant. However the
> tavlang converter should then recognize rdf:value
> While it seems Jena's "riot" on the command line can ignore this syntactic
> error and parse the other triples, loading with Jena's RDFDataMgr.read()
> seems to bail out on the first error, meaning we also lose dcterms:creator
> which are correctly defined in the metadata.rdf.
> This bug is to investigate if it's possible to reduce this error to a
> warning, as well as add support for the rdf:value variant that we can
> recommend to JWSOnline instead of the semantically broken
> parseType="Resource" pattern.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)