Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data

Harris A. Jaffee Fri, 11 Sep 2009 07:57:29 -0700

On UNIX/Linux, which seems to be the case here, you can follow your Rprocessexternally by attaching with strace (or truss, trace -- whatever itmight becalled), i.e. do "strace -p <pid>". This should tell you which fileis beingread at the moment, and how well. You can also do "ls -l /proc/<pid>/fd" fora snapshot of the open files. Or try top or ps (with the rightoptions).


On Sep 10, 2009, at 6:16 PM, Martin Morgan wrote:

Pratap, Abhishek wrote:

Hi Ivan
I suspected it but not 100% sure. My % cpu for R processfluctuates btwn (60-100) and swap usage looks ok to me.
I remember there was some talk on the mailing list that devversion (R/ShortRead) is a lot faster.


Hi Abhishek,

My experience is in the 5 minutes / lane range for qa, so it wouldseem

to be running a long time. The ... arguments to qa are passed to the

function that reads individual files (readAligned), so you caninclude a

verbose=TRUE argument for a bit more chat. You might write a short
script along the lines of

  gcinfo(TRUE)
  library(ShortRead)
  dirPath <- "some/directory"
  pattern <- "<some_pattern>"
  stopifnot(list.files(dir, pattern) != <files I'm expecting>
  qa <- qa(dirPath, pattern, type=<my type>, verbose=TRUE)
  save(qa, file=<some file>)

Try running this from the command line

  R -f MyScript.R

the gcinfo(TRUE) will cause R to start printing messages about

Garbage collection 3 = 2+0+1 (level 0) ...
7.4 Mbytes of cons cells used (39%)
1.3 Mbytes of vectors used (21%)
Garbage collection 4 = 3+0+1 (level 0) ...
10.3 Mbytes of cons cells used (55%)

which indicates that R is busy managing it's memory even beforestarting

to do real work. So give R more memory until it quiets down

  R --min-nsize=20M --min-vsize=4G -f MyScript.R

(these values are my best guess at what is appropriate, the M is
'million', the 'G' Giga).

qa() should be reading one file at a time, so the memoryrequirement is

for the largest (product of reads and cycles) lane. You should be able
to get a handle on the size of that using readAligned().

How many reads and cycles are there in your data?

Martin

Thanks,
-Abhi

-----Original Message-----
From: Ivan Gregoretti [mailto:[email protected]]
Sent: Thursday, September 10, 2009 5:01 PM
To: Pratap, Abhishek
Cc: Martin Morgan; [email protected]
Subject: Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data

It sounds like you may have run out of memory in your linux box.

When I run qa() in my 16GB machine, it usually uses ~14GB just for
this qa() process.

That is for 36 bases. May be, it you are running 75 bases, you just
used all the RAM.

Is the processor running 100%? Check it issuing 'top' at the command
line. If it is, then you are good.

'top' can also tell you is you are swapping wildly. (swapping is when
your machine runs out of RAM memory and starts storing data in a
temporary location in you hard drive to avoid crashing.)

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

sessionInfo()

R version 2.10.0 Under development (unstable) (2009-08-12 r49169)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base







On Thu, Sep 10, 2009 at 4:35 PM, Pratap, Abhishek
<[email protected]> wrote:

Hi  Martin



I am noticing a lethargic or may be hanged processing with qa()
function in ShortRead. I know I have raised this question before.
Recently I have updated my R to dev version and installed latest

bioconductor.Currently I am trying to run qa() on 8 lanes ofdata for

75 bp reads.  The CPU is 16 cores with 16 GB RAM.



It has been two hours since the processing has been going on. Is it
usually takes so long. I am not sure.  Will using Rmpi help ?





Thanks,

-Abhi



sessionInfo()

R version 2.9.2 (2009-08-24)

x86_64-unknown-linux-gnu



locale:

LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI

ON=C



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base




other attached packages:

[1] ShortRead_1.2.1   lattice_0.17-25   BSgenome_1.12.3
Biostrings_2.12.8

[5] IRanges_1.2.3



loaded via a namespace (and not attached):

[1] Biobase_2.4.1 grid_2.9.2    hwriter_1.1






       [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Slow/hanged QA on Illumina Data

Reply via email to