Have you considered leaf ( http://www.biomedcentral.com/1471-2105/14/201) or bpipe ( http://m.bioinformatics.oxfordjournals.org/content/early/2012/04/11/bioinformatics.bts167.abstract)? These seem to be closer to your description of what you are looking for.
FWIW, I have not used either tool, but have only read their publications. Robert On Sep 26, 2014 8:57 PM, "Gabriel A. Devenyi" <[email protected]> wrote: > Hi Software-Carpentry Discuss, > > At the COmputational BRain Anatomy Lab at the Douglas Institute in > Montreal, the Kimel Family Translational Imaging-Genetics Lab at CAMH in > Toronto, and in neuroscience in general, we have a great need to stitch > many small command line data processing tools (minc-toolkit etc) to run > against very large datasets. At some points in the pipeline, these tools > could be run against all the input subjects in parallel, but at other > points we need the previous steps to be completed so we can aggregate > across subjects. > > In searching for a tool to manage this workflow, we have found a few > (nipype, ruffus, taverna, pydpiper, joblib). But we found that these tools > either required programming in the file input-output management or writing > of new classes for the pipeline tool. This doesn't fit well with our user > base of non-programmers who have a general understanding of scripting. We > want to enable them to as easily as possible transform a serial bash script > into something that can run in parallel on a supercomputer. > > Having found no tool, we have considering developing our own tool we have > dubbed "Pipeliner - The stupid pipeline maker" which will live at > https://github.com/CobraLab/pipeliner > > We have posted a "functional" prototype of what Pipeliner would do, see > https://github.com/CobraLab/pipeliner/issues/1 > > Below is an example of serial bash code we'd like to be able to > parallelize: > ```sh > # correct all images before we begin > for image in input/atlases/* input/subjects/*; do > correct $image output/nuc/$(basename $image) > done > > # register all atlases to each subject > for atlas in input/atlases/*; do > for subject in input/subjects/*; do > register $atlas $subject output/registrations/$(basename > $atlas)/$(basename $subject)/reg.xfm > done > done > > # creage an average transformation for each subject > for subject in input/subjects/*; do > subjectname=$(basename $subject) > xfmaverage output/registrations/*/$subjectname/reg.xfm > output/averagexfm/$subjectname.xfm > done > ``` > > This tool would generate an internal representation of a set of commands > and then use a number of output plugins to generate bash scripts, > GridEngine jobs, slurm jobs, or other outputs. > > Does anyone have experience creating workflows like this, or know of an > existing tool we could use instead of rolling our own? We welcome comments, > suggestions, projects that already did this and collaborators to help build > this tool. Thanks everyone for your help! > > > -- > Gabriel A. Devenyi B.Eng. Ph.D. > e: [email protected] > > _______________________________________________ > Discuss mailing list > [email protected] > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org >
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
