Greetings

I am working on adapting readIntensities from ShortRead to handle the new Illumina intensity file format, *.cif. Illumina has dropped the leading zeros from the file name so that if you use list.files to get file names from the old style you get:

list.files(pattern="int.txt.p.gz")
[1] "s_1_0001_int.txt.p.gz" "s_1_0002_int.txt.p.gz" "s_1_0003_int.txt.p.gz" "s_1_0004_int.txt.p.gz" "s_1_0005_int.txt.p.gz" [6] "s_1_0006_int.txt.p.gz" "s_1_0007_int.txt.p.gz" "s_1_0008_int.txt.p.gz" "s_1_0009_int.txt.p.gz" "s_1_0010_int.txt.p.gz" [11] "s_1_0011_int.txt.p.gz" "s_1_0012_int.txt.p.gz" "s_1_0013_int.txt.p.gz" "s_1_0014_int.txt.p.gz" "s_1_0015_int.txt.p.gz" [16] "s_1_0016_int.txt.p.gz" "s_1_0017_int.txt.p.gz" "s_1_0018_int.txt.p.gz" "s_1_0019_int.txt.p.gz" "s_1_0020_int.txt.p.gz"

which puts everything in the order that one would like to read. I believe this is because the lexical sorting matches the arithmetic order of the tiles.

The new scheme yields:

list.files(pattern="cif")
[1] "s_1_1.cif" "s_1_10.cif" "s_1_100.cif" "s_1_101.cif" "s_1_102.cif" "s_1_103.cif" "s_1_104.cif" "s_1_105.cif" "s_1_106.cif" [10] "s_1_107.cif" "s_1_108.cif" "s_1_109.cif" "s_1_11.cif" "s_1_110.cif" "s_1_111.cif" "s_1_112.cif" "s_1_113.cif" "s_1_114.cif" [19] "s_1_115.cif" "s_1_116.cif" "s_1_117.cif" "s_1_118.cif" "s_1_119.cif" "s_1_12.cif" "s_1_120.cif" "s_1_13.cif" "s_1_14.cif"

which complicates building the requisite data structures because it's not in tile order.

The new convention is further complicated by the fact that the intensity files are now arranged in sub-folders by cycle and lane.

I could buffer everything until it's all read and then organize it appropriately, but it seems like it would much simpler if I could get the vector into tile order instead of lexical order. I don't see a command or other simple way to do this, but I'm hoping someone will be able to offer a suggestion. Anybody have any ideas?

Thanks

Mike

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to