Hi Joseph -- (your emails contain both a To: and a Cc: bioc-sig-seq, this means that two copies of each post make it to the list).
joseph <[email protected]> writes: > Hi > I have 4 data files generated from 2 lanes of a Solexa paired-end run. > The qa function worked fine, but the readAligned gave me an error: >>qaSummary = qa("/../data", type="SolexaExport") >> qaSummary > class: SolexaExportQA(9) > QA elements (access with qa[["elt"]]): > readCounts: data.frame(4 3) > baseCalls: data.frame(4 5) > readQualityScore: data.frame(6144 4) > baseQuality: data.frame(376 3) > alignQuality: data.frame(279 3) > frequentSequences: data.frame(600 4) > sequenceDistribution: data.frame(4928 4) > perCycle: list(2) > baseCall: data.frame(800 4) > quality: data.frame(3830 5) > perTile: list(2) > readCounts: data.frame(1200 4) > medianReadQualityScore: data.frame(1200 4) > >> qaSummary[["readCounts"]] > read filtered aligned > s_2_1_export.txt 12915438 7665822 7080343 > s_2_2_export.txt 12915438 7665822 7277877 > s_3_1_export.txt 17296866 7452595 7892453 > s_3_2_export.txt 17296866 7452595 8086091 >> > > When I tried to read all 4 files, I got the error below: >>aln = readAligned("/../data", type="SolexaExport") > Error: Input/Output > 'readAligned' failed to parse files > dirPath: '/../data' > pattern: '' > type: 'SolexaExport' > error: negative length vectors are not allowed "/../data" says "start at the root directory of the file system and go one level up, then down to the 'data' directory". Is this what you mean, or perhaps "./../data"? readAligned as invoked above has no 'pattern' argument, and so will match all files in the directory '/../data'. Likely what you want is to add an argument pattern=".*_export.*" or similar; using list.files("/../data") is a good way to see what you're trying to read in, for instance list.files("/../data", ".*_export.*") The error itself likely comes from trying to allocate a very large object. In general at least for the initial stages of an analysis a good work flow might read a 'large' file, perform some processing to result in a 'small' object, read the next, etc., and finally combine the small objects. This is what qa() does (visit an _export file, summarize, visit the next, combine results into the object that you refer to as qaSummary); a simple (and too naive) start to a ChIP-seq analysis might generate a list of files containing aligned reads and then summarize where in the genome the reads align to using ShortRead::coverage (I did not test the following code), files <- list.files(dirPath, ".*_export.*", full=TRUE) cvg <- lapply(files, function(file) { aln <- readAligned(dirname(file), basename(file), type="SolexaExport") coverage(aln) }) the results of coverage() are quite small and manageable, and the cvg object contains data from all 'files'; using srapply rather than lapply would allow this to be distrbuted across processors. Martin > >> sessioInfo() > R version 2.8.1 Patched (2009-03-03 r48046) > i386-apple-darwin9.6.0 > locale: > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > other attached packages: > [1] ShortRead_1.0.6 lattice_0.17-20 Biobase_2.2.2 > Biostrings_2.10.16 > [5] IRanges_1.0.13 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
