Re: [Discuss] RFC: Dumb Python Pipelining Tool

C. Titus Brown Sun, 28 Sep 2014 03:50:06 -0700

On Fri, Sep 26, 2014 at 08:55:56PM -0400, Gabriel A. Devenyi wrote:
> Hi Software-Carpentry Discuss,
> 
> At the COmputational BRain Anatomy Lab at the Douglas Institute in
> Montreal, the Kimel Family Translational Imaging-Genetics Lab at CAMH in
> Toronto, and in neuroscience in general, we have a great need to stitch
> many small command line data processing tools (minc-toolkit etc) to run
> against very large datasets. At some points in the pipeline, these tools
> could be run against all the input subjects in parallel, but at other
> points we need the previous steps to be completed so we can aggregate
> across subjects.
> 
> In searching for a tool to manage this workflow, we have found a few
> (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools
> either required programming in the file input-output management or writing
> of new classes for the pipeline tool. This doesn't fit well with our user
> base of non-programmers who have a general understanding of scripting. We
> want to enable them to as easily as possible transform a serial bash script
> into something that can run in parallel on a supercomputer.


Have you seen Makeflow?  I don't have any experience with it but my
HPC-aware friends speak of it with approval (and they're an elitist
bunch, so ... :)

best,
--titus

> 
> Having found no tool, we have considering developing our own tool we have
> dubbed "Pipeliner - The stupid pipeline maker" which will live at
> https://github.com/CobraLab/pipeliner
> 
> We have posted a "functional" prototype of what Pipeliner would do, see
> https://github.com/CobraLab/pipeliner/issues/1
> 
> Below is an example of serial bash code we'd like to be able to parallelize:
> ```sh
> # correct all images before we begin
> for image in input/atlases/* input/subjects/*; do
>    correct $image output/nuc/$(basename $image)
> done
> 
> # register all atlases to each subject
> for atlas in input/atlases/*; do
>     for subject in input/subjects/*; do
>         register $atlas $subject output/registrations/$(basename
> $atlas)/$(basename $subject)/reg.xfm
>     done
> done
> 
> # creage an average transformation for each subject
> for subject in input/subjects/*; do
>    subjectname=$(basename $subject)
>    xfmaverage output/registrations/*/$subjectname/reg.xfm
> output/averagexfm/$subjectname.xfm
> done
> ```
> 
> This tool would generate an internal representation of a set of commands
> and then use a number of output plugins to generate bash scripts,
> GridEngine jobs, slurm jobs, or other outputs.
> 
> Does anyone have experience creating workflows like this, or know of an
> existing tool we could use instead of rolling our own? We welcome comments,
> suggestions, projects that already did this and collaborators to help build
> this tool. Thanks everyone for your help!
> 
> 
> -- 
> Gabriel A. Devenyi B.Eng. Ph.D.
> e: [email protected]

> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

-- 
C. Titus Brown, [email protected]

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Re: [Discuss] RFC: Dumb Python Pipelining Tool

Reply via email to