Stian Soiland-Reyes created TAVERNA-901:
-------------------------------------------

             Summary: Run Docker from Taverna
                 Key: TAVERNA-901
                 URL: https://issues.apache.org/jira/browse/TAVERNA-901
             Project: Apache Taverna
          Issue Type: Story
          Components: Taverna Common Activities
            Reporter: Stian Soiland-Reyes


h2. GSOC: Add Docker support to Taverna

The proposed GSOC project is to add support for invoking Docker containers 
within Taverna by adding a Docker Activity plugin.

Tasks include:

* Propose JSON model for describing a {{docker run}} command
* (Optional) Validate Docker activity config, e.g. can the docker image be 
pulled?
* Investigate: New Docker activity, or modify existing External Tool activity?
* Make/modify a Taverna Activity plugin for executing Docker (may or may not be 
based on the External Tool activity)
* (Optional) Capture docker metadata and add to workflow run provenance (e.g. 
which docker image ID was pulled)
* (Optional) Add Bioboxes support
* (Optional) Integrate with CWL support (TAVERNA-900)

Other Taverna/Docker--related tasks can of course also be proposed by the 
students.


h2. Docker

[Docker|https://www.docker.com/] is a Linux container virtualization platform. 
A Linux _container_ is a special kernel feature, which similar to _chroot 
jails_ behave as a separate machine, but unlike Virtual Machines do not have 
the overhead of virtualization of hardware. 

Docker is popular in the _devops_ movement as it provides an easy way to 
install dependencies for software development and deployment, e.g. to run 
servers for mySQL, Apache Solr or node.js.

In brief a _Docker Image_ contains a virtual Linux file system (e.g. a 
miniature Debian installation). A _Docker Container_ is a particular execution 
of a Docker Image, which typically runs a single process as installed within 
the container, and may have network ports exposed to the world, or have parts 
of the host computer's file system mounted within the inner container.

One great advantage of Docker is that it simplifies tool *installation*, as 
each Docker image is a _self-contained Linux distribution_ which don't have to 
be compatible with the host computer (beyond the kernel). 

For Windows and OS X users Docker automatically manage a virtual machine 
running the Linux containers, but Docker containers can also be deployed on the 
cloud or a local cluster, e,g. using _Docker Machine_.

Docker images can be created from a {{Dockerfile}}, which basically lists the 
commands to run to prepare the image. Docker images can be chained together 
using _base images_ - for instance to build on an image with mySQL, the 
Dockerfile says {{FROM mysql}}.

Thus Docker is also an important tool for *reproducibility*, as these images 
can be automatically kept up to date and are distributed through the [Docker 
hub|https://hub.docker.com/]. In bioinformatics, this has led to 
[Bioboxes|http://bioboxes.org/], a standard for creating interchangable 
bioinformatics software containers.


h2. Taverna

[Apache Taverna|http://taverna.incubator.apache.org/] (incubating) is a 
Java-based workflow system with a graphical design interface. Taverna workflows 
can combine many different service types, including REST and WSDL services, 
command line tools, scripts (e.g. BeanShell, R) and custom plugins (e.g. 
BioMart).

Taverna workflows can be executed on the desktop, on the command line, or on a 
Taverna server installation, which can be controlled from a web portal, a 
mobile app, or integrated into third-party applications.

Taverna is used in a [wide range of 
sciences|http://taverna.incubator.apache.org/introduction/taverna-in-use/] for 
data analysis and processing, including bioinformatics, cheminformatics, 
biodiversity and musicology. Workflow engine features include provenance 
tracking, implicit parallelism/iterations, retry/failover and looping. 

Taverna workflows are commonly shared on 
[myExperiment|http://www.myexperiment.org], and can either be created 
graphically in the [Taverna 
workbench|http://taverna.incubator.apache.org/download/workbench/], 
programmatically using the [Taverna Language 
API|http://taverna.incubator.apache.org/download/language/] or by generating 
workflow definitions in the 
[SCUFL2|http://taverna.incubator.apache.org/documentation/scufl2/]  format.


h2. Community engagement

Interested GSOC students are requested to engage early with the 
[dev@taverna|http://taverna.incubator.apache.org/community/lists#devtaverna] 
mailing list to describe their ideas for approaching this project, to clarify 
the tasks and for any questions and issues.

As a first step, the prospective applicant should leave a comment on this Jira 
issue to indicate their interest, and the GSOC mentors would be happy to assist 
on any questions. 

As the project starts we are expecting the student to become part of the 
dev@taverna community to regularly discuss their progress. 


h2. Mentors

An important part of GSOC is the personal mentoring from existing  members of 
the open source community. Our job is not just to teach you how to successfully 
get through the GSOC programme, but also to motivate you and make sure you 
progress. We will show you how to contribute to open source, debug, improve, 
document, test and release your code as part of Apache Taverna. 

The GSOC mentors for Apache Taverna have experience from guiding multiple 
earlier GSOC students and local students, and can be contacted privately for 
day-to-day interaction and trouble-shooting. 

Mentors for this GSOC project:

* Stian Soiland-Reyes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to