Hi Martin, everyone,

I've been looking forward to doing it for a long time now, and, finally, I got the time. So, I dove into the ShortRead C code to add some functionalities when loading Illumina export files. I've added an option to the readAligned method, specifically for the type "SolexaExport" that will in addition to the default information, retrieve the multiplex barcode and the paired read number (the 6 and 7th column of the export file, that were ignored so far). Additionally, using this option will create the sequence identifier (i.e. the one you get in a fastq file extracted from an export file) and populate the id slot of the alignedRead object.

I've attached the diff of my local working copy with the revision 44842 of ShortRead (the current one, as of this morning), two example export files (one from a single-end (SE) and one from a paired-end (PE) sequencing experiment) and a small R script showing the modified usage.

I think that these functionalities are very interesting for people, like me, who have to analyze PE, multiplexed data, and I'd be glad if they got integrated.

Finally, I'm, by far, not a C expert, so you might wish/(need?) to optimize what I've written.

Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: [email protected]
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------


Attachment: my_copy_vs_revision_44842.diff
Description: Binary data

Attachment: test_SE_export.txt.gz
Description: GNU Zip compressed data

Attachment: test_PE_export.txt.gz
Description: GNU Zip compressed data

Attachment: test.R
Description: Binary data

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to