On UNIX/Linux, which seems to be the case here, you can follow your R
process
externally by attaching with strace (or truss, trace -- whatever it
might be
called), i.e. do "strace -p <pid>". This should tell you which file
is being
read at the moment, and how well. You can also do "ls -l /proc/<pid>/
fd" for
a snapshot of the open files. Or try top or ps (with the right
options).
On Sep 10, 2009, at 6:16 PM, Martin Morgan wrote:
Pratap, Abhishek wrote:
Hi Ivan
I suspected it but not 100% sure. My % cpu for R process
fluctuates btwn (60-100) and swap usage looks ok to me.
I remember there was some talk on the mailing list that dev
version (R/ShortRead) is a lot faster.
Hi Abhishek,
My experience is in the 5 minutes / lane range for qa, so it would
seem
to be running a long time. The ... arguments to qa are passed to the
function that reads individual files (readAligned), so you can
include a
verbose=TRUE argument for a bit more chat. You might write a short
script along the lines of
gcinfo(TRUE)
library(ShortRead)
dirPath <- "some/directory"
pattern <- "<some_pattern>"
stopifnot(list.files(dir, pattern) != <files I'm expecting>
qa <- qa(dirPath, pattern, type=<my type>, verbose=TRUE)
save(qa, file=<some file>)
Try running this from the command line
R -f MyScript.R
the gcinfo(TRUE) will cause R to start printing messages about
Garbage collection 3 = 2+0+1 (level 0) ...
7.4 Mbytes of cons cells used (39%)
1.3 Mbytes of vectors used (21%)
Garbage collection 4 = 3+0+1 (level 0) ...
10.3 Mbytes of cons cells used (55%)
which indicates that R is busy managing it's memory even before
starting
to do real work. So give R more memory until it quiets down
R --min-nsize=20M --min-vsize=4G -f MyScript.R
(these values are my best guess at what is appropriate, the M is
'million', the 'G' Giga).
qa() should be reading one file at a time, so the memory
requirement is
for the largest (product of reads and cycles) lane. You should be able
to get a handle on the size of that using readAligned().
How many reads and cycles are there in your data?
Martin
Thanks,
-Abhi
-----Original Message-----
From: Ivan Gregoretti [mailto:[email protected]]
Sent: Thursday, September 10, 2009 5:01 PM
To: Pratap, Abhishek
Cc: Martin Morgan; [email protected]
Subject: Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data
It sounds like you may have run out of memory in your linux box.
When I run qa() in my 16GB machine, it usually uses ~14GB just for
this qa() process.
That is for 36 bases. May be, it you are running 75 bases, you just
used all the RAM.
Is the processor running 100%? Check it issuing 'top' at the command
line. If it is, then you are good.
'top' can also tell you is you are swapping wildly. (swapping is when
your machine runs out of RAM memory and starts storing data in a
temporary location in you hard drive to avoid crashing.)
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-12 r49169)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
On Thu, Sep 10, 2009 at 4:35 PM, Pratap, Abhishek
<[email protected]> wrote:
Hi Martin
I am noticing a lethargic or may be hanged processing with qa()
function in ShortRead. I know I have raised this question before.
Recently I have updated my R to dev version and installed latest
bioconductor.Currently I am trying to run qa() on 8 lanes of
data for
75 bp reads. The CPU is 16 cores with 16 GB RAM.
It has been two hours since the processing has been going on. Is it
usually takes so long. I am not sure. Will using Rmpi help ?
Thanks,
-Abhi
sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_
US.U
TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
AME=
C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFI
CATI
ON=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
Biostrings_2.12.8
[5] IRanges_1.2.3
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing