Hi Martin, everyone,I've been looking forward to doing it for a long time now, and, finally, I got the time. So, I dove into the ShortRead C code to add some functionalities when loading Illumina export files. I've added an option to the readAligned method, specifically for the type "SolexaExport" that will in addition to the default information, retrieve the multiplex barcode and the paired read number (the 6 and 7th column of the export file, that were ignored so far). Additionally, using this option will create the sequence identifier (i.e. the one you get in a fastq file extracted from an export file) and populate the id slot of the alignedRead object.
I've attached the diff of my local working copy with the revision 44842 of ShortRead (the current one, as of this morning), two example export files (one from a single-end (SE) and one from a paired-end (PE) sequencing experiment) and a small R script showing the modified usage.
I think that these functionalities are very interesting for people, like me, who have to analyze PE, multiplexed data, and I'd be glad if they got integrated.
Finally, I'm, by far, not a C expert, so you might wish/(need?) to optimize what I've written.
Best, --------------------------------------------------------------- Nicolas Delhomme High Throughput Functional Genomics Center European Molecular Biology Laboratory Tel: +49 6221 387 8426 Email: [email protected] Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany ---------------------------------------------------------------
my_copy_vs_revision_44842.diff
Description: Binary data
test_SE_export.txt.gz
Description: GNU Zip compressed data
test_PE_export.txt.gz
Description: GNU Zip compressed data
test.R
Description: Binary data
_______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
