yes, many tools don't read from stdin, you're right.  in practice, i
actually have each task write it's part to the node's local scratch
disk and also do implicit conversions in this step as well (e.g.
scatter fastq as fasta).  but not all clusters have a local
also, as you mentioned, the seek solution wouldn't work for compressed infiles.

as i try to avoid working on the galaxy internals, i implemented this
as a command-line utility.
<command>psub --fastqToFasta $infile --cat $outfile $infile
instead of the nonparallel: <command> $infile $outfile</command>

but it would be nice to see this functionality in galaxy.  i thought
about reimplementing this as a job runner but noticed
there was already

On Fri, Aug 26, 2011 at 12:41 PM, Duddy, John <> wrote:
> Many of the tools out there work on files, and assume they are supposed to 
> work on the whole file (or take arguments for subsets that vary from tool to 
> tool).
> I'm working on a way for Galaxy to handle all these tools transparently, even 
> if, as in my case, the files are compressed but the tools cannot read 
> compressed files.
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail:
> -----Original Message-----
> From: Edward Kirton []
> Sent: Friday, August 26, 2011 12:34 PM
> To: Duddy, John
> Cc:
> Subject: Re: [galaxy-dev] using Galaxy for map/reduce
> Not intending to hijack the thread, but in response to John's comment
> -- I, too, made a general solution for embarassingly parallel problems
> but instead of splitting the large files on disk, I just use seek to
> move the file pointer so each task can grab it's part.
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

Reply via email to