On Thu, Oct 20, 2011 at 2:15 PM, Eric Cabot <ca...@biotech.wisc.edu> wrote:
>> I was not aware of this new naming. It seems like a terrible decision from
>> Illumina because now both reads in a pair technically have the same ID (but
>> a different description).
>
> This is not quite the case. Here are two fastq header lines for a pair of
> reads produced by Illumina's CASAVA 1.8:
>
> @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 1:N:0:CTTGTA
> @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 2:N:0:CTTGTA

Yes, Illumina gives both read 1 and read 2 the same template ID
of XYZZY:123:D0ABCDEFG:7:1101:1445:2057 (much like the
two reads would have the same ID in a SAM/BAM file).

> The two key things to note, relevant to this discussion are:
>
> 1. A space character is used to split the fields into two groups.
> This is actually a good thing, because that particular character can NEVER
> appear in either a sequence or a quality line. This make it easy to detect
> name lines as those beginning with "@" (a valid quality character) and also
> having a space. If you are writing a parser for the new Illumina fastq
> format, please don't break the names on spaces!

Yes, you could use the space as a sanity test for *this* style Illumina
FASTQ, and have a bespoke parser which treats this all specially.
But for a generic FASTQ parser you *should* split at the space.

The point is Illumina have changed the meaning of their FASTQ
identifier, it used to be the template ID plus a /1 or /2 suffix, but
now it is just the common template ID used for both parts.

> 2. Appart from the read number, encoded as the digit immediately following
> the space, the two lines are identical--as they were with earlier CASAVA
> versions.  Why is this worse than two lines differing by "/1" vs. "/2"?

Because it is a change from the existing well established convention,
which will require changed to hundreds of scripts and and tools
(guessed number including user's bespoke scripts).

> An additional improvement with the new naming convention is that flowcell
> and run ID's, as well as a flag for not passing filters (where N means does
> PF), are now included.

Yes, that is good.

Peter

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to