[ 
https://issues.apache.org/jira/browse/TAVERNA-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377071#comment-16377071
 ] 

Stian Soiland-Reyes commented on TAVERNA-1037:
----------------------------------------------

Thank you for your interest [~gmora1223] - and great you already have had a 
look! A background with semantic technologies and RDF4J will definitely help! 
Perhaps a related topic would be to change Taverna Language to use Commons RDF 
instead of direct Jena dependency (TAVERNA-1017).

I think currently the app:// URI is created by 
https://github.com/apache/incubator-taverna-language/blob/master/taverna-robundle/src/main/java/org/apache/taverna/robundle/fs/BundleFileSystemProvider.java
 when "mounting" the zip file - basically now it is only fresh random UUIDs. As 
such it's not possible currently to open from a UUID as there is no way to know 
which file you mean by a given UUID (and the BundleFileSystemProvider is 
stateless, it does not remember).  If the file system is mounted ("open") then 
looking up paths using the UUID app:// (soon arcp) works.

This is mainly the intention of the arcp scheme as well - some app finds an 
archive file, it "mounts" it somehow and associates it with a corresponding 
arcp base URI.  It needs to somehow keep that association for the duration 
(random UUID session), or be able to recreate it (e.g. from byte content hash 
or uuidv5 from source URL)

If you modify to support arcp URIs instead, you could in theory allow mounting 
from any arcp://ni, prefixed URI which could allow locating the source using 
the .well-known location. However you would still need to know which NI 
endpoint(s) to use - perhaps as a system property or set up with a static 
method first. This could be a good way to optionally extend the GSOC project if 
there is time.  I can try to add .well-known/ni support to 
https://view.commonwl.org/ so we can test it there.

In general we might need different constructors or arguments to say which kind 
of arcp authority is desired, with javadoc to explain the differences. 

Another optional could be to take this one step further with Commons RDF (just 
opening such Path's), say to expose a complete Dataset with absolute arcp URIs 
for the manifest and all the referenced annotations. The arcp URIs would fit in 
very well here because you could look at the graphs, export them or whatever, 
and if you want to read a file described in the graph you can open it with the 
RO Bundle API. 

> GSOC: Use arcp:// URIs in RO Bundle
> -----------------------------------
>
>                 Key: TAVERNA-1037
>                 URL: https://issues.apache.org/jira/browse/TAVERNA-1037
>             Project: Apache Taverna
>          Issue Type: New Feature
>          Components: Taverna Language
>         Environment: Java, HTTP
>            Reporter: Stian Soiland-Reyes
>            Priority: Major
>              Labels: RFC, gsoc2018, java, uri
>
> This is a project idea for [Google Summer of 
> Code|https://summerofcode.withgoogle.com/] (GSOC). To discuss this or other 
> ideas with your potential mentor from the Apache Taverna project, sign up and 
> post to the 
> [dev@taverna|https://taverna.incubator.apache.org/community/lists.html#devtaverna]
>  list, including "[GSOC]" in the subject. You may also comment on this Jira 
> issue if you have created an account.
> --
> The 
> [ro-bundle|https://github.com/apache/incubator-taverna-language/tree/master/taverna-robundle]
>  module of Taverna Language currently uses app:// URIs in its Java 
> [FileSystem|https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html]
>  URIs and thus also in its RDF loading.
> This was in accordance with [RO bundle 
> spec|https://researchobject.github.io/specifications/bundle/#absolute-uris] – 
> however the app:// URI scheme has been since abandoned.
> The arcp URI scheme has been proposed as an alternative to describe paths 
> within an archive (e.g. ZIP file): 
> [https://tools.ietf.org/id/draft-soilandreyes-arcp-03.html] (This 
> Internet-Draft is progressing towards an RFC)
> This proposal suggests to modify RO Bundle to use arcp:// URIs – but not just 
> modify app://bf5a0cab-86d7-40da-b588-1ce4953ae13d/ to 
> arcp://uuid,bf5a0cab-86d7-40da-b588-1ce4953ae13d/ - but to support the other 
> mechanisms suggested by arcp in 
> [https://tools.ietf.org/id/draft-soilandreyes-arcp-03.html#rfc.section.4.1]
> That is, it should be possible to open an RO Bundle from a fixed URL as 
> identifier, or using its sha256 checksum in readonly mode.
> It should also be possible to lookup an RO Bundle URI from a .well-known 
> endpoint as defined in 
> [https://tools.ietf.org/id/draft-soilandreyes-arcp-03.html#rfc.section.4.4]
> Extensions to this project could be to add a Java URL handler so that URLs 
> from an opened RO Bundle file system also can be used as java.net.URLs. Also 
> it could develop a new arcp-java module similar to the reference Python 
> implementation https://pypi.python.org/pypi/arcp
> Prospective students are expected to participate in the Apache [Taverna 
> community|https://taverna.incubator.apache.org/community/lists.html#devtaverna]
>  - but are also welcome to join the [IETF |https://www.ietf.org]review 
> process  in reviewing or improving the arcp Internet-Draft to progress it 
> towards RFC.
> Suggested mentor: Stian Soiland-Reyes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to