Hello Martin and Everybody, I tried your suggestion and it works nicely when the number of reads is not so big.
Successful example: if I have three instances, aln000, aln0550 and aln100 like this > aln000 class: AlignedRead length: 9465484 reads; width: 36 cycles chromosome: chr11.fa chr13.fa ... chr6.fa chr6.fa position: 100667123 52735524 ... 121341376 25134423 strand: + + ... + + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig > aln050 class: AlignedRead length: 8918057 reads; width: 36 cycles chromosome: chr5.fa chr15.fa ... chr16.fa chr8.fa position: 149155914 57872637 ... 95751778 36611628 strand: + + ... + + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig > aln100 class: AlignedRead length: 11261186 reads; width: 36 cycles chromosome: chr4.fa chr5.fa ... chr10.fa chr1.fa position: 66224960 140647218 ... 69579797 16009268 strand: + + ... + + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig In can successfully apply the consolidating function: > superDuperConsolidator <- function(...) Reduce(append, list(...)) > aln_000_100 <- superDuperConsolidator(aln000, aln050, aln100) > aln_000_100 class: AlignedRead length: 29644727 reads; width: 36 cycles chromosome: chr11.fa chr13.fa ... chr10.fa chr1.fa position: 100667123 52735524 ... 69579797 16009268 strand: + + ... + + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig Not successful example: Now I try to consolidate AlignedRead instances that are twice as big > aln000 class: AlignedRead length: 21845985 reads; width: 36 cycles chromosome: chr17.fa chr1.fa ... chr18.fa chr9.fa position: 41890422 142562489 ... 57003322 108499164 strand: - - ... - + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig > aln050 class: AlignedRead length: 21961352 reads; width: 36 cycles chromosome: chr18.fa chr16.fa ... chr15.fa chr9.fa position: 88900833 22029306 ... 102993167 83200074 strand: - - ... + - alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig > aln100 class: AlignedRead length: 20865366 reads; width: 36 cycles chromosome: chr1.fa chr12.fa ... chr15.fa chr9.fa position: 99986382 14243887 ... 93339870 75136974 strand: + - ... - + alignQuality: NumericQuality alignData varLabels: run lane ... filtering contig > superDuperConsolidator <- function(...) Reduce(append, list(...)) > aln_000_100 <- superDuperConsolidator(aln000, aln050, aln100) Error in .local(.Object, ...) : 'length' must be a single non-negative integer In addition: Warning message: In width1 + width2 : NAs produced by integer overflow I tried that with two different data sets; both failed. So, it is not the data itself but the amount of data, I believe. The append() function also fails when trying to consolidate two AlignedRead instances, 50 million tags each. Do you thing that I have reached a limit or is there a way to "grow" AlignedRead instances slowly and gently? By the way, I am using a server with very large memory now. So, memory efficiency is far less important than successful consolidation. sessionInfo() is the same. Thank you, Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1592 Fax: 1-301-496-9878 On Thu, Aug 27, 2009 at 6:45 PM, Martin Morgan<[email protected]> wrote: > Hi Ivan -- > > Ivan Gregoretti wrote: >> >> Hello everybody, >> >> Is there any memory efficient way to consolidate multiple AlignedRead >> objects into one? >> >> >> Example: >> >> Lets say that I have 10 AlignedRead instances, 10 million tags each. >> Lets call those instances aln01 through aln10. >> >> I can consolidate two of them like this: >> >> aln <- append(aln01, aln02) > > I don't think there's anything built-in. You could try this > > superDuperConsolidator <- function(...) > Reduce(append, list(...)) > > it might not be too bad memory-wise. > > Martin > >> >> Can I consolidate all AlignRead instances in a single shot? Something like >> this: >> >> aln <- superDuperConsolidator(aln01, aln02, aln03, ..., aln10) >> >> Thank you, >> >> Ivan >> >> ######################################################### >>> >>> sessionInfo() >> >> R version 2.10.0 Under development (unstable) (2009-08-12 r49169) >> x86_64-unknown-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] ShortRead_1.3.27 lattice_0.17-25 BSgenome_1.13.10 >> Biostrings_2.13.34 >> [5] IRanges_1.3.60 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 >> >> ######################################################### >> >> Ivan Gregoretti, PhD >> National Institute of Diabetes and Digestive and Kidney Diseases >> National Institutes of Health >> 5 Memorial Dr, Building 5, Room 205. >> Bethesda, MD 20892. USA. >> Phone: 1-301-496-1592 >> Fax: 1-301-496-9878 >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
