Re: GSoC 2016 Docker support for Taverna

Nadeesh Dilanga Wed, 23 Mar 2016 13:37:08 -0700

Hi Stian,
Thank you very much for the valuable feedback. Completed the requested
changes. Please let me know if you see anything else.
I will anyway submit this to GSoC tonight, because I can change it till
25th.


On Wed, Mar 23, 2016 at 1:39 PM, Stian Soiland-Reyes <[email protected]>
wrote:

> Thanks!
>
> Overall your proposal looks good! Precise and well structured. I like
> the diagram, it shows good understanding.
>
> "Travena" -> "Taverna"
>
> TAVERNA-879 proposes a way to execute a particular step of a Taverna
> workflow as if it was a command line. This could enable any "classic"
> Taverna workflow to be converted to a CWL workflow - not just those
> with the Tool Activity.   Imagine a command line tool that takes a
> JSON/YAML file which corresponds to the existing configuration any
> existing Taverna activity (e.g. R, WSDL, REST, Beanshell) - and runs a
> corresponding one-step workflow with corresponding input and output
> files.
>
> While this might not be an efficient way to do many of the Taverna
> steps in a non-Taverna CWL engine, it could be a nice transition -
> allowing you to start your workflow in Taverna, then save as CWL to
> run on any CWL engine (assuming no fancy iterations were done in the
> Taverna wf), and develop further by editing the CWL, replacing some of
> the Taverna steps with more 'native' tools, e.g. calling "curl" in a
> Docker image instead of a Taverna step that just does a REST call.
>
>
> TAVERNA-879 is a bit more experimental (and exciting!) and hard to
> tell how far you would get, so I would mark your Task 7 as optional in
> your proposal, but keep it in the schedule - thus you have the option
> to free up time at the end if for instance you struggle to capture the
> Docker metadata for task 6.
>
>
>
> Under "Deliverables" you say "The entire source code zipped" - I think
> we would prefer to follow the same pattern we used for last year's
> GSOC - where we ask the students to sign the Apache Individual
> Contributer License Agreement https://www.apache.org/licenses/icla.txt
> -
>
> Then you commit your code continually to our git repositories using
> GitHub pull requests. (If you don't like GitHub we can do git patches
> by email/Jira) - rather than a big ZIP at the end which we have to
> figure out. This parts helps you learn how to interact with open
> source project - and it teaches us on how to interact with third-party
> submissions :)
>
> So I would change deliverable 4 to "Regular GitHub pull requests with
> source code" - we can agree on the repositories later - I guess
> docker-activity would be added to taverna-common-activities - while
> the TAVERNA-879 tool could be added to the taverna-command-line.
>
>
> As for testing it would be great to start with some example workflows
> which just runs Docker with the existing Tool activity - you could
> develop these during your first 4 weeks as a way to get to know
> Taverna. And then we can transition those workflows for the new Docker
> activity in the Testing steps - and they can become a separate
> deliverable.
>
>
>
> On 23 March 2016 at 08:54, Nadeesh Dilanga <[email protected]> wrote:
> > Hi Stian et al,
> > Here I have drafted my proposal [1]. Appreciate everone's feedback on the
> > proposal. Please let me know if this is not align with your original
> > expectation from this project. Or whether it needs any scope level
> changes.
> >
> > Apart from that, @TAVERNA-900 can you please clarify following;
> >
> > "Create a Docker tool for executing Taverna activities (TAVERNA-879) -
> *this
> > allows any Taverna steps to be used by other CWL engines*"
> >
> > [1] -
> >
> https://docs.google.com/document/d/1DKYuzr2hA5brQ2xBz_AVQgMWXB5qm6rftbWoGbnbXrg/edit?usp=sharing
> >
> > On Tue, Mar 22, 2016 at 1:42 PM, Nadeesh Dilanga <[email protected]>
> > wrote:
> >
> >> Hi,
> >> Thank you very much for the quick response. I will go through these bit
> >> more and get back when I meet any roadblocks.
> >>
> >> On Mon, Mar 21, 2016 at 10:15 PM, Stian Soiland-Reyes <[email protected]
> >
> >> wrote:
> >>
> >>> On 21 March 2016 at 00:51, Nadeesh Dilanga <[email protected]>
> wrote:
> >>>
> >>> > First of all, apologize for the delayed response. I wanted to give my
> >>> self
> >>> > bit more time to understand and going through what Taverna is and
> what
> >>> > exactly the expected outcome of the project (tutorials and related
> slide
> >>> > decks and also youtube videos were very helpful). Because this will
> be
> >>> my
> >>> > one and only GSoC proposal and I want it to be perfect!.
> >>>
> >>> Thanks!  You don't have to do it perfect - just great! :-))
> >>>
> >>> > 1. Taverna is a BPMN like(but more extensive and scoped more widely
> in
> >>> > features) workflow engine which has several ways of creating work
> flows
> >>> and
> >>> > different interfaces of access them.
> >>>
> >>> While I guess we don't like to be compared with BPMN, I think you are
> >>> correct. :)
> >>>
> >>>
> >>> >  2. When creating workflows, one major extension point to cater
> custom
> >>> use
> >>> > cases is, to plug/create your own services/service types which is a
> >>> great
> >>> > model IMHO. And this project is in fact to write an adapter(activity
> >>> plugin
> >>> > which I believe is the executor of an invocation of a service) when
> some
> >>> > one needs to run something on Docker at some phase of his workflow.
> >>>
> >>> Correct - thus one could have a workflow with multiple tools from
> >>> different docker images.
> >>>
> >>>
> >>> > if #2 is correct, can you please provide me an example of an use case
> >>> which
> >>> > led to this project idea, because feels I may be missing something
> here.
> >>> > Because IMHO, even for docker eventually it will be a service
> invocation
> >>> > from a workflow front, and what Tarvena needs is some activity
> plugins
> >>> that
> >>> > are aware of the particular transport protocols.
> >>>
> >>> We already have the Tool activity which allow you to run command line
> >>> tools - however such workflows are hard to share as anyone receiving
> >>> it may not have that tool installed, or in the same version/location.
> >>>
> >>> While approaches like https://www.debian.org/devel/debian-med/ and
> >>> BioLinux have helped towards "How to get it installed" - it then moves
> >>> the requirement to a particular operating system, which in a way is
> >>> worse.
> >>>
> >>> Docker solves the "How to consistently install this tool" problem -
> >>> and even works (almost) seemlessly from OS X and Windows. It adds nice
> >>> reproducibility aspects as you can mark the exact snapshot version of
> >>> the docker image you have used.
> >>>
> >>>
> >>> There are now also initiatives such as http://bioboxes.org/ (and  to a
> >>> certain degreehttp://bio.tools/ ) which describe bioinformatics tools
> >>> as Docker images - thus these can in theory be used directly from
> >>> Taverna.
> >>>
> >>>
> >>> Perhaps part of the project would be to define a use case so we find
> >>> some actual command lines we want to run in a Taverna workflow - e.g.
> >>> to run HMMER for sequence alignment using
> >>> https://hub.docker.com/r/dockerbiotools/hmmer/ using sequences fetched
> >>> from an EBI web service?  I am not sure how much of the bioinformatics
> >>> side you would like to get into! :)
> >>>
> >>>
> >>>
> >>> > (example: http service hosted in Docker, Http activity plugin,
> Message
> >>> > Broker service hosted in Docker, you need AMQP,MQTT like activity
> >>> plugin)
> >>>
> >>> Yes, but I don't think we want to run many of those kind of services
> >>> from Taverna, I was thinking more of running just command line tools
> >>> that happen to be packaged as Docker images.
> >>>
> >>> > 3. Or the case is to invoke some composite applications that
> >>> > deployed/installed in Docker disregarding what the protocols are ?
> >>>
> >>> No, this would get a bit more complex, so I would stay away from that
> >>> for the GSOC project - although of course the potential is very
> >>> interesting motivation as well.
> >>>
> >>> I think this is what I described in
> >>> https://issues.apache.org/jira/browse/TAVERNA-941
> >>>
> >>>
> >>> > if #3 is correct, what we run in the docker container can be another
> >>> > Taverna workflow. If that is the case your idea on "Save workflow as
> >>> Docker
> >>> > image" will be a superb addition!.
> >>>
> >>> Yes! It should then be possible! But.. why? :)  Run with older Taverna
> >>> version?
> >>>
> >>> One interesting thing could be if there's also "Save workflow as
> >>> Docker image" - if such a docker image is added as a Docker image -
> >>> would be to "unwrap" it and show the inner workflow in Taverna.
> >>>
> >>> With Docker there's a big danger of going down the "It's turtles all
> >>> the way down" recursion - hence I tried to scope the GSOC ideas to be
> >>> more concrete about running command line tools.
> >>>
> >>>
> >>> >  So with this, I would like to understand what Taverna community
> expect
> >>> > from "Invoking Docker from Taverna"  on this GSoC project. So that I
> >>> can be
> >>> > more specific on my project proposal and make it the best project for
> >>> this
> >>> > summer for Taverna.
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Mar 18, 2016 at 7:18 AM, Stian Soiland-Reyes <
> [email protected]>
> >>> > wrote:
> >>> >
> >>> >> On 17 March 2016 at 15:22, alaninmcr <[email protected]>
> wrote:
> >>> >> >> I found Docker as an excellent solution for scaling, easy
> >>> deployment and
> >>> >> >> obviously a hot topic these days in enterprises who want to
> >>> implement
> >>> >> >> micro
> >>> >> >> services based architecture/deployment for low footprint
> >>> >> servers/services.
> >>> >> >>
> >>> >> >> I presume the idea behind Docker support for Taverna is NOT from
> a
> >>> micro
> >>> >> >> service standpoint, but more like from a packaging and deployment
> >>> >> >> perspective. Please correct me if I am wrong.
> >>> >>
> >>> >> No, you are right in that our current Docker ideas would not be
> about
> >>> >> creating Taverna (or Taverna workflow) as a micro-service,. but to
> use
> >>> >> Docker for execution.
> >>> >>
> >>> >> A similar aspect could be to use Docker to start up a set of
> >>> >> microservices accompanying the Workflow, and then access them from
> >>> >> Taverna workflow using the existing WSDL and REST activities.
> >>> >> This is something that I am interested in within the
> >>> >> http://bioexcel.eu/ project - but is a bit more architecturally
> >>> >> challenging as it would mean things like dynamic port bindings in
> the
> >>> >> workflow configuration. It
> >>> >>
> >>> >> I've tracked this as
> https://issues.apache.org/jira/browse/TAVERNA-941
> >>> >> but IMHO it would be a too big task for a GSOC project.
> >>> >>
> >>> >>
> >>> >> > There are two separate issues:
> >>> >> >
> >>> >> > https://issues.apache.org/jira/browse/TAVERNA-901 is to allow
> >>> Taverna
> >>> >> > workflows to include steps that are tools that inside docker
> >>> containers.
> >>> >> > That would be deployment of an existing docker.
> >>> >> >
> >>> >> > https://issues.apache.org/jira/browse/TAVERNA-879 is to create
> >>> docker
> >>> >> > containers for Taverna workflows. That is packaging and (because
> the
> >>> >> > containers will be part of a CWL workflow) deployment.
> >>> >>
> >>> >> Nadeesh, I've added your interest to
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/TAVERNADEV/2016-03+GSOC+2016
> >>> >>
> >>> >> but if you are more interested in packaging for Docker, then perhaps
> >>> >> we could look at the existing Docker wrapping of Taverna Server
> >>> >>
> >>> >> https://hub.docker.com/r/taverna/taverna-server/
> >>> >> https://github.com/taverna-extras/taverna-server-docker
> >>> >>
> >>> >> and consider doing something similar for our command line tools
> >>> >> "executeworkflow" and "tavlang".
> >>> >>
> >>> >> That shouldn't take you too long - so you may want to prototype one
> of
> >>> >> TAVERNA-901 and TAVERNA-879 as well.
> >>> >>
> >>> >>
> >>> >> I know Dmitry used wsdl-generic as a command line tool as in
> >>> >> http://inb.bsc.es/documents/galaxygears/ which could also be
> >>> >> interesting as a Docker container (e.g. for running WSDL services
> >>> >> within a CWL workflow), but I am not sure where the source code for
> >>> >> that is (is that outside Apache, Dmitry?)
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> >> If that is the case, can you please clarify what is the current
> >>> >> packaging
> >>> >> >> deployment model ?
> >>> >>
> >>> >>
> >>> >> For Taverna 2.5 we used install4j via Maven to package into an
> >>> installer:
> >>> >>
> >>> >>
> >>> >>
> >>>
> https://github.com/apache/incubator-taverna-commandline/blob/old/taverna-commandline-product-core-20141228/pom.xml#L1712
> >>> >>
> >>> >> That's what made the installers we have at
> >>> >> https://taverna.incubator.apache.org/download/command-line-tool/
> >>> >>
> >>> >> One packaging task we could consider for Taverna 3.0 is to update
> >>> >>
> >>> >>
> >>>
> https://github.com/apache/incubator-taverna-commandline/tree/master/taverna-commandline-product
> >>> >> to use install4j or similar to generate such installers also for
> >>> >> Taverna 3, which has a slightly different
> >>> >> folder structure.
> >>> >>
> >>> >> As an open source project we have 5 licenses for Install4j, but we
> >>> >> have not asked the author yet if this is still valid under Apache.
> >>> >> Now releasing under Apache license instead of LGPL we would
> ironically
> >>> >> now be allowed to bundle the binary Oracle JRE rather than having to
> >>> >> use the open source
> >>> >> OpenJDK builds.
> >>> >>
> >>> >> But I'm afraid such a task would not involve Docker - as I think
> most
> >>> >> users of Taverna Command line would not have Docker (or even the
> right
> >>> >> Java version) installed.
> >>> >>
> >>> >>
> >>> >>
> >>> >> > There is no current mechanism for packaging up something to run a
> >>> >> specific
> >>> >> > Taverna workflow. You can run workflows from the command line tool
> >>> or on
> >>> >> a
> >>> >> > Taverna Server.
> >>> >>
> >>> >> Making a recipe for generating Docker images for running a
> particular
> >>> >> Taverna Workflow could be interesting. We could then have "Save
> >>> >> workflow as Docker image" built into Taverna!
> >>> >>
> >>> >> If you are thinking about such an idea, feel free to suggest it as a
> >>> >> new Jira task!
> >>> >>
> >>> >>
> >>> >>
> >>> >> Overall - you don't have to pick exactly our ideas - you can be
> >>> >> inspired by them and will have to write your own proposal about what
> >>> >> work you propose to do (which should be reasonably scoped and
> >>> >> scheduled) and say how Apache Taverna would benefit.
> >>> >>
> >>> >> Looking forward to hear more about your ideas!
> >>> >>
> >>> >> --
> >>> >> Stian Soiland-Reyes
> >>> >> Apache Taverna (incubating), Apache Commons RDF (incubating)
> >>> >> http://orcid.org/0000-0001-9842-9718
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Stian Soiland-Reyes
> >>> Apache Taverna (incubating), Apache Commons RDF (incubating)
> >>> http://orcid.org/0000-0001-9842-9718
> >>>
> >>
> >>
>
>
>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons RDF (incubating)
> http://orcid.org/0000-0001-9842-9718
>

Re: GSoC 2016 Docker support for Taverna

Reply via email to