Hi.

On Wed, Sep 19, 2012 at 5:36 AM, S. A. Haider <shazi...@gmail.com> wrote:
> Hi Guys,
>
> Following preprocessing (doCRMAv2), i am running segmentation using CBS. I
> have few hundreds of samples (~600 SNP6) and as far as i understand, aroma
> tries to keep memory usage to minimum. So far, it looks that it will take
> roughly 2.5 days to run this analysis. Is there a way to run the
> segmentation (CBS fit()) faster, i have around 200GB of RAM and hoping that
> i can use this to speed up the processing by avoid I/O ? Any suggestions ?

See http://aroma-project.org/settings/ for aroma settings, including
the "memory/ram" one.

However, the CBSModel is processed sample by sample and chromosome by
chromosome, so the memory overhead is already low, and I'm quite sure
(without looking at the code) that it is not affected by "memory/ram".
 Most of the time spend for the segmentation process is actually spent
on the actual segmentation algorithm.  There may be a tiny overhead
from generating verbose statements and possibly from access the input
data file once per chromosome (instead of once per genome), but again,
that should be minor.

What you can do is to run CBS in parallel.  Here is a sketch running
it on two cores on your local machine:

library("aroma.affymetrix");
library("parallel");

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Setup up the CRMAv2 output data set
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
dataSet <- "GSE8605";
chipType <- "Mapping10K_Xba142";
tags <- "ACC,-XY,BPN,-XY,AVG,FLN,-XY"; # From doASCRMAv2()
dsN <- AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=chipType);
print(dsN);

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Segment total CNs
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Allocate compute cluster with two nodes
cl <- makeCluster(2L);
print(cl);

# Share necessary information with the compute notes
clusterExport(cl, "dsN");

# Ask the compute nodes to run CBS on individual arrays
res <- parLapply(cl, X=seq(dsN), fun=function(ii) {
  library("aroma.affymetrix");
  doCBS(dsN, arrays=ii);
});

To run it on different machines, see the help of the 'parallel' package.

WARNING TO EVERYONE: The CbsModel is truly parallel, so it is safe to
run in parallel. Do not try to run other aroma steps in parallel an
expect it to work out of the box - the risks for race conditions are
many and in worst case aroma won't detect them (despite a multitude of
internal checks) and you'll end up with corrupt results.  I'm in the
process of identify more cases and document them, but until then
you're on your own.

/Henrik

>
> thanks
> Syed
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Reply via email to