Hi. On Wed, Sep 19, 2012 at 5:36 AM, S. A. Haider <shazi...@gmail.com> wrote: > Hi Guys, > > Following preprocessing (doCRMAv2), i am running segmentation using CBS. I > have few hundreds of samples (~600 SNP6) and as far as i understand, aroma > tries to keep memory usage to minimum. So far, it looks that it will take > roughly 2.5 days to run this analysis. Is there a way to run the > segmentation (CBS fit()) faster, i have around 200GB of RAM and hoping that > i can use this to speed up the processing by avoid I/O ? Any suggestions ?
See http://aroma-project.org/settings/ for aroma settings, including the "memory/ram" one. However, the CBSModel is processed sample by sample and chromosome by chromosome, so the memory overhead is already low, and I'm quite sure (without looking at the code) that it is not affected by "memory/ram". Most of the time spend for the segmentation process is actually spent on the actual segmentation algorithm. There may be a tiny overhead from generating verbose statements and possibly from access the input data file once per chromosome (instead of once per genome), but again, that should be minor. What you can do is to run CBS in parallel. Here is a sketch running it on two cores on your local machine: library("aroma.affymetrix"); library("parallel"); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Setup up the CRMAv2 output data set # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - dataSet <- "GSE8605"; chipType <- "Mapping10K_Xba142"; tags <- "ACC,-XY,BPN,-XY,AVG,FLN,-XY"; # From doASCRMAv2() dsN <- AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=chipType); print(dsN); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Segment total CNs # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Allocate compute cluster with two nodes cl <- makeCluster(2L); print(cl); # Share necessary information with the compute notes clusterExport(cl, "dsN"); # Ask the compute nodes to run CBS on individual arrays res <- parLapply(cl, X=seq(dsN), fun=function(ii) { library("aroma.affymetrix"); doCBS(dsN, arrays=ii); }); To run it on different machines, see the help of the 'parallel' package. WARNING TO EVERYONE: The CbsModel is truly parallel, so it is safe to run in parallel. Do not try to run other aroma steps in parallel an expect it to work out of the box - the risks for race conditions are many and in worst case aroma won't detect them (despite a multitude of internal checks) and you'll end up with corrupt results. I'm in the process of identify more cases and document them, but until then you're on your own. /Henrik > > thanks > Syed > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/