Re: [Discuss] RFC: Dumb Python Pipelining Tool

Robert M. Flight Fri, 26 Sep 2014 18:22:28 -0700

Have you considered leaf ( http://www.biomedcentral.com/1471-2105/14/201)
or bpipe (
http://m.bioinformatics.oxfordjournals.org/content/early/2012/04/11/bioinformatics.bts167.abstract)?
These seem to be closer to your description of what you are looking for.


FWIW, I have not used either tool, but have only read their publications.

Robert
On Sep 26, 2014 8:57 PM, "Gabriel A. Devenyi" <[email protected]> wrote:

> Hi Software-Carpentry Discuss,
>
> At the COmputational BRain Anatomy Lab at the Douglas Institute in
> Montreal, the Kimel Family Translational Imaging-Genetics Lab at CAMH in
> Toronto, and in neuroscience in general, we have a great need to stitch
> many small command line data processing tools (minc-toolkit etc) to run
> against very large datasets. At some points in the pipeline, these tools
> could be run against all the input subjects in parallel, but at other
> points we need the previous steps to be completed so we can aggregate
> across subjects.
>
> In searching for a tool to manage this workflow, we have found a few
> (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools
> either required programming in the file input-output management or writing
> of new classes for the pipeline tool. This doesn't fit well with our user
> base of non-programmers who have a general understanding of scripting. We
> want to enable them to as easily as possible transform a serial bash script
> into something that can run in parallel on a supercomputer.
>
> Having found no tool, we have considering developing our own tool we have
> dubbed "Pipeliner - The stupid pipeline maker" which will live at
> https://github.com/CobraLab/pipeliner
>
> We have posted a "functional" prototype of what Pipeliner would do, see
> https://github.com/CobraLab/pipeliner/issues/1
>
> Below is an example of serial bash code we'd like to be able to
> parallelize:
> ```sh
> # correct all images before we begin
> for image in input/atlases/* input/subjects/*; do
>    correct $image output/nuc/$(basename $image)
> done
>
> # register all atlases to each subject
> for atlas in input/atlases/*; do
>     for subject in input/subjects/*; do
>         register $atlas $subject output/registrations/$(basename
> $atlas)/$(basename $subject)/reg.xfm
>     done
> done
>
> # creage an average transformation for each subject
> for subject in input/subjects/*; do
>    subjectname=$(basename $subject)
>    xfmaverage output/registrations/*/$subjectname/reg.xfm
> output/averagexfm/$subjectname.xfm
> done
> ```
>
> This tool would generate an internal representation of a set of commands
> and then use a number of output plugins to generate bash scripts,
> GridEngine jobs, slurm jobs, or other outputs.
>
> Does anyone have experience creating workflows like this, or know of an
> existing tool we could use instead of rolling our own? We welcome comments,
> suggestions, projects that already did this and collaborators to help build
> this tool. Thanks everyone for your help!
>
>
> --
> Gabriel A. Devenyi B.Eng. Ph.D.
> e: [email protected]
>
> _______________________________________________
> Discuss mailing list
> [email protected]
>
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
>

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Re: [Discuss] RFC: Dumb Python Pipelining Tool

Reply via email to