Phil,

we also have a MiSeq and currently experienced the same phenomenon how to 
demultiplex.

We have our local galaxy instance and wrote some scripts to efficiently 
demultiplex the sample. However, you FIRST need to " convert" on the MiSeq the 
primary fastq into a (in their view) multiplex identified fastq. There the 
final :0 in all your headers get converted in the multiplex or sample ID you 
gave it in the sample sheet! I see you all have zeros which is not quite 
helpfull.

After the samplesheet conversion we just concat all fastq files which you then 
can easily group the reads on the final multiplex id en demultiplex it in 
separate files. In addition you can split the forward and reversed by the 
<space>1 and <space>2 identifyers in the header. Many tools do not require the 
conversion to /1 and /2 any more but this can be easily done locally with for 
instance sed on unix. We converted it like this:



@M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:2

@M00132:6:000000000-A0JG4:1:1:18014:1842 2:N:0:2

into

@M00132:6:000000000-A0JG4:1:1:18014:1842/1 1:N:0:2

@M00132:6:000000000-A0JG4:1:1:18014:1842/2 2:N:0:2



Since many tools grep till the first space.



I might pop the scripts soon in the toolshed but that might not be of great 
help maybe....otherwise pm me and I send you the script (perl).



Alex





________________________________
Van: galaxy-user-boun...@lists.bx.psu.edu 
[galaxy-user-boun...@lists.bx.psu.edu] namens Philip Dean 
[philip.d...@nbt.nhs.uk]
Verzonden: maandag 27 februari 2012 19:45
To: 'galaxy-u...@bx.psu.edu'
Onderwerp: [galaxy-user] demultiplex Miseq data with separate index file.

I am using Galaxy main site to analyse MiSeq data of pooled samples. 
Essentially the run produces 3 fastq files consisting of R1, R2 read files and 
a separate index file. They are in the format below.

R1:                  @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0
                        Sequence data

R2:                  @M00132:6:000000000-A0JG4:1:1:18014:1842 2:N:0:0
                        Sequence data

Index:              @M00132:6:000000000-A0JG4:1:1:18014:1842 1:N:0:0
CTCGGT
+
<@@DFD

 I would like to use Galaxy to demultiplex the samples and then analyse them 
individually.  I have found barcode Splitter (version 1.0.0) on Galaxy however 
this tool requires the index to be found at the beginning of the sequence. 
Therefore I am attempting to add the index sequence onto the end of the 
sequence read data. FASTQ joiner (version 1.0.0) joins fastq files, however the 
fastqs to be joint must be distinguished by a /1 or /2 at end of sequence 
identifiers. Does anyone have any advice or experience of demultiplexing data 
in this format?
Thanks,
Phil

DISCLAIMER: The information in this message is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this message by 
anyone else is unauthorised. If you are not the intended recipient, any 
disclosure, copying, or distribution of the message, or any action or omission 
taken by you in reliance on it, is prohibited and may be unlawful. Please 
immediately contact the sender if you have received this message in error. 
Thank you.  ­­


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to