Re: GSoC 2016 Docker support for Taverna

Stian Soiland-Reyes Wed, 23 Mar 2016 10:40:27 -0700

Thanks!

Overall your proposal looks good! Precise and well structured. I like
the diagram, it shows good understanding.


"Travena" -> "Taverna"

TAVERNA-879 proposes a way to execute a particular step of a Taverna
workflow as if it was a command line. This could enable any "classic"
Taverna workflow to be converted to a CWL workflow - not just those
with the Tool Activity.   Imagine a command line tool that takes a
JSON/YAML file which corresponds to the existing configuration any
existing Taverna activity (e.g. R, WSDL, REST, Beanshell) - and runs a
corresponding one-step workflow with corresponding input and output
files.

While this might not be an efficient way to do many of the Taverna
steps in a non-Taverna CWL engine, it could be a nice transition -
allowing you to start your workflow in Taverna, then save as CWL to
run on any CWL engine (assuming no fancy iterations were done in the
Taverna wf), and develop further by editing the CWL, replacing some of
the Taverna steps with more 'native' tools, e.g. calling "curl" in a
Docker image instead of a Taverna step that just does a REST call.


TAVERNA-879 is a bit more experimental (and exciting!) and hard to
tell how far you would get, so I would mark your Task 7 as optional in
your proposal, but keep it in the schedule - thus you have the option
to free up time at the end if for instance you struggle to capture the
Docker metadata for task 6.



Under "Deliverables" you say "The entire source code zipped" - I think
we would prefer to follow the same pattern we used for last year's
GSOC - where we ask the students to sign the Apache Individual
Contributer License Agreement https://www.apache.org/licenses/icla.txt
-

Then you commit your code continually to our git repositories using
GitHub pull requests. (If you don't like GitHub we can do git patches
by email/Jira) - rather than a big ZIP at the end which we have to
figure out. This parts helps you learn how to interact with open
source project - and it teaches us on how to interact with third-party
submissions :)

So I would change deliverable 4 to "Regular GitHub pull requests with
source code" - we can agree on the repositories later - I guess
docker-activity would be added to taverna-common-activities - while
the TAVERNA-879 tool could be added to the taverna-command-line.


As for testing it would be great to start with some example workflows
which just runs Docker with the existing Tool activity - you could
develop these during your first 4 weeks as a way to get to know
Taverna. And then we can transition those workflows for the new Docker
activity in the Testing steps - and they can become a separate
deliverable.



On 23 March 2016 at 08:54, Nadeesh Dilanga <[email protected]> wrote:
> Hi Stian et al,
> Here I have drafted my proposal [1]. Appreciate everone's feedback on the
> proposal. Please let me know if this is not align with your original
> expectation from this project. Or whether it needs any scope level changes.
>
> Apart from that, @TAVERNA-900 can you please clarify following;
>
> "Create a Docker tool for executing Taverna activities (TAVERNA-879) - *this
> allows any Taverna steps to be used by other CWL engines*"
>
> [1] -
> https://docs.google.com/document/d/1DKYuzr2hA5brQ2xBz_AVQgMWXB5qm6rftbWoGbnbXrg/edit?usp=sharing
>
> On Tue, Mar 22, 2016 at 1:42 PM, Nadeesh Dilanga <[email protected]>
> wrote:
>
>> Hi,
>> Thank you very much for the quick response. I will go through these bit
>> more and get back when I meet any roadblocks.
>>
>> On Mon, Mar 21, 2016 at 10:15 PM, Stian Soiland-Reyes <[email protected]>
>> wrote:
>>
>>> On 21 March 2016 at 00:51, Nadeesh Dilanga <[email protected]> wrote:
>>>
>>> > First of all, apologize for the delayed response. I wanted to give my
>>> self
>>> > bit more time to understand and going through what Taverna is and what
>>> > exactly the expected outcome of the project (tutorials and related slide
>>> > decks and also youtube videos were very helpful). Because this will be
>>> my
>>> > one and only GSoC proposal and I want it to be perfect!.
>>>
>>> Thanks!  You don't have to do it perfect - just great! :-))
>>>
>>> > 1. Taverna is a BPMN like(but more extensive and scoped more widely in
>>> > features) workflow engine which has several ways of creating work flows
>>> and
>>> > different interfaces of access them.
>>>
>>> While I guess we don't like to be compared with BPMN, I think you are
>>> correct. :)
>>>
>>>
>>> >  2. When creating workflows, one major extension point to cater custom
>>> use
>>> > cases is, to plug/create your own services/service types which is a
>>> great
>>> > model IMHO. And this project is in fact to write an adapter(activity
>>> plugin
>>> > which I believe is the executor of an invocation of a service) when some
>>> > one needs to run something on Docker at some phase of his workflow.
>>>
>>> Correct - thus one could have a workflow with multiple tools from
>>> different docker images.
>>>
>>>
>>> > if #2 is correct, can you please provide me an example of an use case
>>> which
>>> > led to this project idea, because feels I may be missing something here.
>>> > Because IMHO, even for docker eventually it will be a service invocation
>>> > from a workflow front, and what Tarvena needs is some activity plugins
>>> that
>>> > are aware of the particular transport protocols.
>>>
>>> We already have the Tool activity which allow you to run command line
>>> tools - however such workflows are hard to share as anyone receiving
>>> it may not have that tool installed, or in the same version/location.
>>>
>>> While approaches like https://www.debian.org/devel/debian-med/ and
>>> BioLinux have helped towards "How to get it installed" - it then moves
>>> the requirement to a particular operating system, which in a way is
>>> worse.
>>>
>>> Docker solves the "How to consistently install this tool" problem -
>>> and even works (almost) seemlessly from OS X and Windows. It adds nice
>>> reproducibility aspects as you can mark the exact snapshot version of
>>> the docker image you have used.
>>>
>>>
>>> There are now also initiatives such as http://bioboxes.org/ (and  to a
>>> certain degreehttp://bio.tools/ ) which describe bioinformatics tools
>>> as Docker images - thus these can in theory be used directly from
>>> Taverna.
>>>
>>>
>>> Perhaps part of the project would be to define a use case so we find
>>> some actual command lines we want to run in a Taverna workflow - e.g.
>>> to run HMMER for sequence alignment using
>>> https://hub.docker.com/r/dockerbiotools/hmmer/ using sequences fetched
>>> from an EBI web service?  I am not sure how much of the bioinformatics
>>> side you would like to get into! :)
>>>
>>>
>>>
>>> > (example: http service hosted in Docker, Http activity plugin, Message
>>> > Broker service hosted in Docker, you need AMQP,MQTT like activity
>>> plugin)
>>>
>>> Yes, but I don't think we want to run many of those kind of services
>>> from Taverna, I was thinking more of running just command line tools
>>> that happen to be packaged as Docker images.
>>>
>>> > 3. Or the case is to invoke some composite applications that
>>> > deployed/installed in Docker disregarding what the protocols are ?
>>>
>>> No, this would get a bit more complex, so I would stay away from that
>>> for the GSOC project - although of course the potential is very
>>> interesting motivation as well.
>>>
>>> I think this is what I described in
>>> https://issues.apache.org/jira/browse/TAVERNA-941
>>>
>>>
>>> > if #3 is correct, what we run in the docker container can be another
>>> > Taverna workflow. If that is the case your idea on "Save workflow as
>>> Docker
>>> > image" will be a superb addition!.
>>>
>>> Yes! It should then be possible! But.. why? :)  Run with older Taverna
>>> version?
>>>
>>> One interesting thing could be if there's also "Save workflow as
>>> Docker image" - if such a docker image is added as a Docker image -
>>> would be to "unwrap" it and show the inner workflow in Taverna.
>>>
>>> With Docker there's a big danger of going down the "It's turtles all
>>> the way down" recursion - hence I tried to scope the GSOC ideas to be
>>> more concrete about running command line tools.
>>>
>>>
>>> >  So with this, I would like to understand what Taverna community expect
>>> > from "Invoking Docker from Taverna"  on this GSoC project. So that I
>>> can be
>>> > more specific on my project proposal and make it the best project for
>>> this
>>> > summer for Taverna.
>>> >
>>> >
>>> >
>>> > On Fri, Mar 18, 2016 at 7:18 AM, Stian Soiland-Reyes <[email protected]>
>>> > wrote:
>>> >
>>> >> On 17 March 2016 at 15:22, alaninmcr <[email protected]> wrote:
>>> >> >> I found Docker as an excellent solution for scaling, easy
>>> deployment and
>>> >> >> obviously a hot topic these days in enterprises who want to
>>> implement
>>> >> >> micro
>>> >> >> services based architecture/deployment for low footprint
>>> >> servers/services.
>>> >> >>
>>> >> >> I presume the idea behind Docker support for Taverna is NOT from a
>>> micro
>>> >> >> service standpoint, but more like from a packaging and deployment
>>> >> >> perspective. Please correct me if I am wrong.
>>> >>
>>> >> No, you are right in that our current Docker ideas would not be about
>>> >> creating Taverna (or Taverna workflow) as a micro-service,. but to use
>>> >> Docker for execution.
>>> >>
>>> >> A similar aspect could be to use Docker to start up a set of
>>> >> microservices accompanying the Workflow, and then access them from
>>> >> Taverna workflow using the existing WSDL and REST activities.
>>> >> This is something that I am interested in within the
>>> >> http://bioexcel.eu/ project - but is a bit more architecturally
>>> >> challenging as it would mean things like dynamic port bindings in the
>>> >> workflow configuration. It
>>> >>
>>> >> I've tracked this as https://issues.apache.org/jira/browse/TAVERNA-941
>>> >> but IMHO it would be a too big task for a GSOC project.
>>> >>
>>> >>
>>> >> > There are two separate issues:
>>> >> >
>>> >> > https://issues.apache.org/jira/browse/TAVERNA-901 is to allow
>>> Taverna
>>> >> > workflows to include steps that are tools that inside docker
>>> containers.
>>> >> > That would be deployment of an existing docker.
>>> >> >
>>> >> > https://issues.apache.org/jira/browse/TAVERNA-879 is to create
>>> docker
>>> >> > containers for Taverna workflows. That is packaging and (because the
>>> >> > containers will be part of a CWL workflow) deployment.
>>> >>
>>> >> Nadeesh, I've added your interest to
>>> >>
>>> https://cwiki.apache.org/confluence/display/TAVERNADEV/2016-03+GSOC+2016
>>> >>
>>> >> but if you are more interested in packaging for Docker, then perhaps
>>> >> we could look at the existing Docker wrapping of Taverna Server
>>> >>
>>> >> https://hub.docker.com/r/taverna/taverna-server/
>>> >> https://github.com/taverna-extras/taverna-server-docker
>>> >>
>>> >> and consider doing something similar for our command line tools
>>> >> "executeworkflow" and "tavlang".
>>> >>
>>> >> That shouldn't take you too long - so you may want to prototype one of
>>> >> TAVERNA-901 and TAVERNA-879 as well.
>>> >>
>>> >>
>>> >> I know Dmitry used wsdl-generic as a command line tool as in
>>> >> http://inb.bsc.es/documents/galaxygears/ which could also be
>>> >> interesting as a Docker container (e.g. for running WSDL services
>>> >> within a CWL workflow), but I am not sure where the source code for
>>> >> that is (is that outside Apache, Dmitry?)
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> >> If that is the case, can you please clarify what is the current
>>> >> packaging
>>> >> >> deployment model ?
>>> >>
>>> >>
>>> >> For Taverna 2.5 we used install4j via Maven to package into an
>>> installer:
>>> >>
>>> >>
>>> >>
>>> https://github.com/apache/incubator-taverna-commandline/blob/old/taverna-commandline-product-core-20141228/pom.xml#L1712
>>> >>
>>> >> That's what made the installers we have at
>>> >> https://taverna.incubator.apache.org/download/command-line-tool/
>>> >>
>>> >> One packaging task we could consider for Taverna 3.0 is to update
>>> >>
>>> >>
>>> https://github.com/apache/incubator-taverna-commandline/tree/master/taverna-commandline-product
>>> >> to use install4j or similar to generate such installers also for
>>> >> Taverna 3, which has a slightly different
>>> >> folder structure.
>>> >>
>>> >> As an open source project we have 5 licenses for Install4j, but we
>>> >> have not asked the author yet if this is still valid under Apache.
>>> >> Now releasing under Apache license instead of LGPL we would ironically
>>> >> now be allowed to bundle the binary Oracle JRE rather than having to
>>> >> use the open source
>>> >> OpenJDK builds.
>>> >>
>>> >> But I'm afraid such a task would not involve Docker - as I think most
>>> >> users of Taverna Command line would not have Docker (or even the right
>>> >> Java version) installed.
>>> >>
>>> >>
>>> >>
>>> >> > There is no current mechanism for packaging up something to run a
>>> >> specific
>>> >> > Taverna workflow. You can run workflows from the command line tool
>>> or on
>>> >> a
>>> >> > Taverna Server.
>>> >>
>>> >> Making a recipe for generating Docker images for running a particular
>>> >> Taverna Workflow could be interesting. We could then have "Save
>>> >> workflow as Docker image" built into Taverna!
>>> >>
>>> >> If you are thinking about such an idea, feel free to suggest it as a
>>> >> new Jira task!
>>> >>
>>> >>
>>> >>
>>> >> Overall - you don't have to pick exactly our ideas - you can be
>>> >> inspired by them and will have to write your own proposal about what
>>> >> work you propose to do (which should be reasonably scoped and
>>> >> scheduled) and say how Apache Taverna would benefit.
>>> >>
>>> >> Looking forward to hear more about your ideas!
>>> >>
>>> >> --
>>> >> Stian Soiland-Reyes
>>> >> Apache Taverna (incubating), Apache Commons RDF (incubating)
>>> >> http://orcid.org/0000-0001-9842-9718
>>> >>
>>>
>>>
>>>
>>> --
>>> Stian Soiland-Reyes
>>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>>> http://orcid.org/0000-0001-9842-9718
>>>
>>
>>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: GSoC 2016 Docker support for Taverna

Reply via email to