Re: [Discuss] RFC: Dumb Python Pipelining Tool

Mark Stillwell Sat, 27 Sep 2014 00:57:07 -0700

Have you considered pegasus?

http://pegasus.isi.edu/


It isn't quite no-programming-at-all, but it does use an internal
xml-based workflow representation that allows end-users to describe
their pipeline using a variety of languages; chiefly python, java, and
perl. From what I can see these generators aren't a lot more
complicated than your shell scripting example.



On Sat, Sep 27, 2014 at 1:55 AM, Gabriel A. Devenyi
<[email protected]> wrote:
> Hi Software-Carpentry Discuss,
>
> At the COmputational BRain Anatomy Lab at the Douglas Institute in Montreal,
> the Kimel Family Translational Imaging-Genetics Lab at CAMH in Toronto, and
> in neuroscience in general, we have a great need to stitch many small
> command line data processing tools (minc-toolkit etc) to run against very
> large datasets. At some points in the pipeline, these tools could be run
> against all the input subjects in parallel, but at other points we need the
> previous steps to be completed so we can aggregate across subjects.
>
> In searching for a tool to manage this workflow, we have found a few
> (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools
> either required programming in the file input-output management or writing
> of new classes for the pipeline tool. This doesn't fit well with our user
> base of non-programmers who have a general understanding of scripting. We
> want to enable them to as easily as possible transform a serial bash script
> into something that can run in parallel on a supercomputer.
>
> Having found no tool, we have considering developing our own tool we have
> dubbed "Pipeliner - The stupid pipeline maker" which will live at
> https://github.com/CobraLab/pipeliner
>
> We have posted a "functional" prototype of what Pipeliner would do, see
> https://github.com/CobraLab/pipeliner/issues/1
>
> Below is an example of serial bash code we'd like to be able to parallelize:
> ```sh
> # correct all images before we begin
> for image in input/atlases/* input/subjects/*; do
>    correct $image output/nuc/$(basename $image)
> done
>
> # register all atlases to each subject
> for atlas in input/atlases/*; do
>     for subject in input/subjects/*; do
>         register $atlas $subject output/registrations/$(basename
> $atlas)/$(basename $subject)/reg.xfm
>     done
> done
>
> # creage an average transformation for each subject
> for subject in input/subjects/*; do
>    subjectname=$(basename $subject)
>    xfmaverage output/registrations/*/$subjectname/reg.xfm
> output/averagexfm/$subjectname.xfm
> done
> ```
>
> This tool would generate an internal representation of a set of commands and
> then use a number of output plugins to generate bash scripts, GridEngine
> jobs, slurm jobs, or other outputs.
>
> Does anyone have experience creating workflows like this, or know of an
> existing tool we could use instead of rolling our own? We welcome comments,
> suggestions, projects that already did this and collaborators to help build
> this tool. Thanks everyone for your help!
>
>
> --
> Gabriel A. Devenyi B.Eng. Ph.D.
> e: [email protected]
>
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org



-- 
Mark Lee Stillwell
[email protected]

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Re: [Discuss] RFC: Dumb Python Pipelining Tool

Reply via email to