Pratap, Abhishek wrote: > Hi Ivan > > I suspected it but not 100% sure. My % cpu for R process fluctuates btwn > (60-100) and swap usage looks ok to me. > > I remember there was some talk on the mailing list that dev version > (R/ShortRead) is a lot faster.
Hi Abhishek, My experience is in the 5 minutes / lane range for qa, so it would seem to be running a long time. The ... arguments to qa are passed to the function that reads individual files (readAligned), so you can include a verbose=TRUE argument for a bit more chat. You might write a short script along the lines of gcinfo(TRUE) library(ShortRead) dirPath <- "some/directory" pattern <- "<some_pattern>" stopifnot(list.files(dir, pattern) != <files I'm expecting> qa <- qa(dirPath, pattern, type=<my type>, verbose=TRUE) save(qa, file=<some file>) Try running this from the command line R -f MyScript.R the gcinfo(TRUE) will cause R to start printing messages about Garbage collection 3 = 2+0+1 (level 0) ... 7.4 Mbytes of cons cells used (39%) 1.3 Mbytes of vectors used (21%) Garbage collection 4 = 3+0+1 (level 0) ... 10.3 Mbytes of cons cells used (55%) which indicates that R is busy managing it's memory even before starting to do real work. So give R more memory until it quiets down R --min-nsize=20M --min-vsize=4G -f MyScript.R (these values are my best guess at what is appropriate, the M is 'million', the 'G' Giga). qa() should be reading one file at a time, so the memory requirement is for the largest (product of reads and cycles) lane. You should be able to get a handle on the size of that using readAligned(). How many reads and cycles are there in your data? Martin > Thanks, > -Abhi > > -----Original Message----- > From: Ivan Gregoretti [mailto:[email protected]] > Sent: Thursday, September 10, 2009 5:01 PM > To: Pratap, Abhishek > Cc: Martin Morgan; [email protected] > Subject: Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data > > It sounds like you may have run out of memory in your linux box. > > When I run qa() in my 16GB machine, it usually uses ~14GB just for > this qa() process. > > That is for 36 bases. May be, it you are running 75 bases, you just > used all the RAM. > > Is the processor running 100%? Check it issuing 'top' at the command > line. If it is, then you are good. > > 'top' can also tell you is you are swapping wildly. (swapping is when > your machine runs out of RAM memory and starts storing data in a > temporary location in you hard drive to avoid crashing.) > > Ivan > > > Ivan Gregoretti, PhD > National Institute of Diabetes and Digestive and Kidney Diseases > National Institutes of Health > 5 Memorial Dr, Building 5, Room 205. > Bethesda, MD 20892. USA. > Phone: 1-301-496-1592 > Fax: 1-301-496-9878 > >> sessionInfo() > R version 2.10.0 Under development (unstable) (2009-08-12 r49169) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > > > > > > On Thu, Sep 10, 2009 at 4:35 PM, Pratap, Abhishek > <[email protected]> wrote: >> Hi Martin >> >> >> >> I am noticing a lethargic or may be hanged processing with qa() >> function in ShortRead. I know I have raised this question before. >> Recently I have updated my R to dev version and installed latest >> bioconductor.Currently I am trying to run qa() on 8 lanes of data for >> 75 bp reads. The CPU is 16 cores with 16 GB RAM. >> >> >> >> It has been two hours since the processing has been going on. Is it >> usually takes so long. I am not sure. Will using Rmpi help ? >> >> >> >> >> >> Thanks, >> >> -Abhi >> >> >> >> sessionInfo() >> >> R version 2.9.2 (2009-08-24) >> >> x86_64-unknown-linux-gnu >> >> >> >> locale: >> >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U >> TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= >> C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI >> ON=C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> >> >> other attached packages: >> >> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 >> Biostrings_2.12.8 >> >> [5] IRanges_1.2.3 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
