On UNIX/Linux, which seems to be the case here, you can follow your R process externally by attaching with strace (or truss, trace -- whatever it might be called), i.e. do "strace -p <pid>". This should tell you which file is being read at the moment, and how well. You can also do "ls -l /proc/<pid>/ fd" for a snapshot of the open files. Or try top or ps (with the right options).

On Sep 10, 2009, at 6:16 PM, Martin Morgan wrote:

Pratap, Abhishek wrote:
Hi Ivan

I suspected it but not 100% sure. My % cpu for R process fluctuates btwn (60-100) and swap usage looks ok to me.

I remember there was some talk on the mailing list that dev version (R/ShortRead) is a lot faster.

Hi Abhishek,

My experience is in the 5 minutes / lane range for qa, so it would seem
to be running a long time. The ... arguments to qa are passed to the
function that reads individual files (readAligned), so you can include a
verbose=TRUE argument for a bit more chat. You might write a short
script along the lines of

  gcinfo(TRUE)
  library(ShortRead)
  dirPath <- "some/directory"
  pattern <- "<some_pattern>"
  stopifnot(list.files(dir, pattern) != <files I'm expecting>
  qa <- qa(dirPath, pattern, type=<my type>, verbose=TRUE)
  save(qa, file=<some file>)

Try running this from the command line

  R -f MyScript.R

the gcinfo(TRUE) will cause R to start printing messages about

Garbage collection 3 = 2+0+1 (level 0) ...
7.4 Mbytes of cons cells used (39%)
1.3 Mbytes of vectors used (21%)
Garbage collection 4 = 3+0+1 (level 0) ...
10.3 Mbytes of cons cells used (55%)

which indicates that R is busy managing it's memory even before starting
to do real work. So give R more memory until it quiets down

  R --min-nsize=20M --min-vsize=4G -f MyScript.R

(these values are my best guess at what is appropriate, the M is
'million', the 'G' Giga).

qa() should be reading one file at a time, so the memory requirement is
for the largest (product of reads and cycles) lane. You should be able
to get a handle on the size of that using readAligned().

How many reads and cycles are there in your data?

Martin

Thanks,
-Abhi

-----Original Message-----
From: Ivan Gregoretti [mailto:[email protected]]
Sent: Thursday, September 10, 2009 5:01 PM
To: Pratap, Abhishek
Cc: Martin Morgan; [email protected]
Subject: Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data

It sounds like you may have run out of memory in your linux box.

When I run qa() in my 16GB machine, it usually uses ~14GB just for
this qa() process.

That is for 36 bases. May be, it you are running 75 bases, you just
used all the RAM.

Is the processor running 100%? Check it issuing 'top' at the command
line. If it is, then you are good.

'top' can also tell you is you are swapping wildly. (swapping is when
your machine runs out of RAM memory and starts storing data in a
temporary location in you hard drive to avoid crashing.)

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-12 r49169)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base







On Thu, Sep 10, 2009 at 4:35 PM, Pratap, Abhishek
<[email protected]> wrote:
Hi  Martin



I am noticing a lethargic or may be hanged processing with qa()
function in ShortRead. I know I have raised this question before.
Recently I have updated my R to dev version and installed latest
bioconductor.Currently I am trying to run qa() on 8 lanes of data for
75 bp reads.  The CPU is 16 cores with 16 GB RAM.



It has been two hours since the processing has been going on. Is it
usually takes so long. I am not sure.  Will using Rmpi help ?





Thanks,

-Abhi



sessionInfo()

R version 2.9.2 (2009-08-24)

x86_64-unknown-linux-gnu



locale:

LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.U TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFI CATI
ON=C



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base




other attached packages:

[1] ShortRead_1.2.1   lattice_0.17-25   BSgenome_1.12.3
Biostrings_2.12.8

[5] IRanges_1.2.3



loaded via a namespace (and not attached):

[1] Biobase_2.4.1 grid_2.9.2    hwriter_1.1






       [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to