On Fri, Sep 26, 2014 at 08:55:56PM -0400, Gabriel A. Devenyi wrote: > Hi Software-Carpentry Discuss, > > At the COmputational BRain Anatomy Lab at the Douglas Institute in > Montreal, the Kimel Family Translational Imaging-Genetics Lab at CAMH in > Toronto, and in neuroscience in general, we have a great need to stitch > many small command line data processing tools (minc-toolkit etc) to run > against very large datasets. At some points in the pipeline, these tools > could be run against all the input subjects in parallel, but at other > points we need the previous steps to be completed so we can aggregate > across subjects. > > In searching for a tool to manage this workflow, we have found a few > (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools > either required programming in the file input-output management or writing > of new classes for the pipeline tool. This doesn't fit well with our user > base of non-programmers who have a general understanding of scripting. We > want to enable them to as easily as possible transform a serial bash script > into something that can run in parallel on a supercomputer.
Have you seen Makeflow? I don't have any experience with it but my HPC-aware friends speak of it with approval (and they're an elitist bunch, so ... :) best, --titus > > Having found no tool, we have considering developing our own tool we have > dubbed "Pipeliner - The stupid pipeline maker" which will live at > https://github.com/CobraLab/pipeliner > > We have posted a "functional" prototype of what Pipeliner would do, see > https://github.com/CobraLab/pipeliner/issues/1 > > Below is an example of serial bash code we'd like to be able to parallelize: > ```sh > # correct all images before we begin > for image in input/atlases/* input/subjects/*; do > correct $image output/nuc/$(basename $image) > done > > # register all atlases to each subject > for atlas in input/atlases/*; do > for subject in input/subjects/*; do > register $atlas $subject output/registrations/$(basename > $atlas)/$(basename $subject)/reg.xfm > done > done > > # creage an average transformation for each subject > for subject in input/subjects/*; do > subjectname=$(basename $subject) > xfmaverage output/registrations/*/$subjectname/reg.xfm > output/averagexfm/$subjectname.xfm > done > ``` > > This tool would generate an internal representation of a set of commands > and then use a number of output plugins to generate bash scripts, > GridEngine jobs, slurm jobs, or other outputs. > > Does anyone have experience creating workflows like this, or know of an > existing tool we could use instead of rolling our own? We welcome comments, > suggestions, projects that already did this and collaborators to help build > this tool. Thanks everyone for your help! > > > -- > Gabriel A. Devenyi B.Eng. Ph.D. > e: [email protected] > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org -- C. Titus Brown, [email protected] _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
