Hi Software-Carpentry Discuss, At the COmputational BRain Anatomy Lab at the Douglas Institute in Montreal, the Kimel Family Translational Imaging-Genetics Lab at CAMH in Toronto, and in neuroscience in general, we have a great need to stitch many small command line data processing tools (minc-toolkit etc) to run against very large datasets. At some points in the pipeline, these tools could be run against all the input subjects in parallel, but at other points we need the previous steps to be completed so we can aggregate across subjects.
In searching for a tool to manage this workflow, we have found a few (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools either required programming in the file input-output management or writing of new classes for the pipeline tool. This doesn't fit well with our user base of non-programmers who have a general understanding of scripting. We want to enable them to as easily as possible transform a serial bash script into something that can run in parallel on a supercomputer. Having found no tool, we have considering developing our own tool we have dubbed "Pipeliner - The stupid pipeline maker" which will live at https://github.com/CobraLab/pipeliner We have posted a "functional" prototype of what Pipeliner would do, see https://github.com/CobraLab/pipeliner/issues/1 Below is an example of serial bash code we'd like to be able to parallelize: ```sh # correct all images before we begin for image in input/atlases/* input/subjects/*; do correct $image output/nuc/$(basename $image) done # register all atlases to each subject for atlas in input/atlases/*; do for subject in input/subjects/*; do register $atlas $subject output/registrations/$(basename $atlas)/$(basename $subject)/reg.xfm done done # creage an average transformation for each subject for subject in input/subjects/*; do subjectname=$(basename $subject) xfmaverage output/registrations/*/$subjectname/reg.xfm output/averagexfm/$subjectname.xfm done ``` This tool would generate an internal representation of a set of commands and then use a number of output plugins to generate bash scripts, GridEngine jobs, slurm jobs, or other outputs. Does anyone have experience creating workflows like this, or know of an existing tool we could use instead of rolling our own? We welcome comments, suggestions, projects that already did this and collaborators to help build this tool. Thanks everyone for your help! -- Gabriel A. Devenyi B.Eng. Ph.D. e: [email protected]
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
