On Mar 29, 2011, at 10:25 AM, Assaf Gordon wrote:

> Hi Peter,
> Peter Cock wrote, On 03/29/2011 05:39 AM:
>>> 2. the tools accepts FASTA, FASTQ in both Sanger and Illumina
>>> format (no more need for grooming). Illumina is the default for
>>> newly uploaded FASTQ files.
>> I think that's a bad idea - use Sanger FASTQ as the default to be
>> consistent with the rest of Galaxy, and also with CASAVA 1.8
>> Illumina machines will produce that too, see:
>> http://seqanswers.com/forums/showthread.php?t=8895
> Thanks for the link - very interesting read, I wasn't aware of it.
> However, for our local Galaxy server - I'm sticking with Illumina scale until 
> I see real samples with phred-33 in the wild.
> The defaults can be easily changed (in the XML file, simply assume a 
> different scale when the extension is "fastq"),
> or don't accept "fastq" at all and force the user to change the format to 
> either "fastqillumina" or "fastqsanger".
> I'll explain my reasoning:
> We (at our lab) deal mostly with Illumina FASTQ files, with the Illumina 
> scale.
> I'm trying to make life as easy as possible for our users.
> When they upload a FASTQ file, it is by default an Illumina FASTQ file, I 
> want them to be able to use a workflow on it immediately.
> All of our internal tools assume Illumina scale.
> The one time I've tried to make the built-in Bowtie tool available, I got 
> complaints about "why isn't my FASTQ file appear in the input list" - 
> because it was "fastq" and not "fastqsanger" after grooming - this is a silly 
> technical step that should not be a concern to users - so I'm taking it out 
> of the equation here (not to mention that grooming two 14GB FASTQ files for 
> every lane is a huge waste of space and time).

We've gotten into the habit of grooming everything (all of our files are also 
Illumina FASTQ files), so I'm looking forward to the change.  I definitely 
share the concern about the space wasted by essentially having two copies of 
the same data in Galaxy.  We had looked into making Illumina the default for 
our local instance of Galaxy, but in the end we stuck with Sanger (although we 
have talked about "pregrooming" files coming off the sequencer).  The wasted 
time was annoying, so I wrote my own groomer in C that could groom one of our 
FASTQ files in about 5 minutes although if you ran several (6-12) of our custom 
grooming jobs at the same time on the same node the run time would jump up to 
~50 minutes for a 17GB fastq due to IO wait, but it was still  much faster than 
the built-in groomer.

Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to