subject:"Re\: \[aroma.affymetrix\] Speeding up RmaBackgroundCorrection"

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

2014-03-07 Thread Damian Plichta

Hi Henrik,

Lowering memory helped - it's drinks on me when we meet. 

It has been running for approximately 7 days now (ETA for unit type 
'expression': 20140320 23:20:26). With the updated affxparser can I speed 
it up by increasing the memory burden? The current memory allocation is 
approximately 3 Gbytes. I don't want to cancel the run if it means loosing 
the progress though.

Best,
Damian

On Tuesday, March 4, 2014 8:32:17 PM UTC-5, Henrik Bengtsson wrote:

 Did lowering memory/ram solve your problem? 

 Also, an updated version of affxparser that no longer should overflow 
 by the integer multiplication is available (on Bioconductor). 

 Cheers, 

 Henrik 

 On Thu, Feb 27, 2014 at 12:36 PM, Henrik Bengtsson 
 henrik.b...@aroma-project.org javascript: wrote: 
  Congratulations Damian, 
  
  I think your the first one to hit a limit of the Aroma Framework 
  (remind me to by you a drink whenever you see me in person). 
  
  I narrowed it down to the affxparser(*) package and I'll investigate 
  further on how to fix this.  It should not occur and I'm confident 
  that it can be avoided internally.  In the meanwhile, try to lower 
  your 'memory/ram' setting, e.g. setOption(aromaSettings, memory/ram, 
  10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a 
  good clue (for me) on what's causing it. 
  
  /Henrik 
  
  DETAILS: The below illustrates the issue in affxparser::readCelUnits(): 
  
  .Machine$integer.max 
  [1] 2147483647 
  nbrOfArrays - 5622L 
  .Machine$integer.max / nbrOfArrays 
  [1] 381978.6 
  nbrOfCells - 381978L 
  nbrOfCells * nbrOfArrays 
  [1] 2147480316 
  nbrOfCells - 381979L 
  nbrOfCells * nbrOfArrays 
  [1] NA 
  Warning message: 
  In nbrOfCells * nbrOfArrays : NAs produced by integer overflow 
  
  By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively 
  becomes smaller. 
  
  
  On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta 
  damian@gmail.com javascript: wrote: 
  Hi Henrik, 
  
  Thank you, that was helpful. 
  
  I run to another problem though. I am trying to perform 
 ExonRmaPlm(csQN, 
  merge=TRUE) but this produces a following error: 
  
  20140226 23:25:33|   Identifying CDF cell indices...done 
  Error in vector(double, nbrOfCells * nbrOfArrays) : 
vector size cannot be NA 
  In addition: Warning message: 
  In nbrOfCells * nbrOfArrays : NAs produced by integer overflow 
  20140226 23:28:35|  Reading probe intensities from 5622 
 arrays...done 
  20140226 23:28:35| Fitting chunk #1 of 1 of 'expression' units 
 (code=1) 
  with various dimensions...done 
  20140226 23:28:35|Unit dimension #3 (various dimensions) of 
 3...done 
  20140226 23:28:35|   Fitting the model by unit dimensions (at least for 
 the 
  large classes)...done 
  20140226 23:28:35|  Unit type #1 ('expression') of 1...done 
  20140226 23:28:35| Fitting ExonRmaPlm for each unit type 
 separately...done 
  20140226 23:28:35|Fitting model of class ExonRmaPlm...done 
  
  I testes whether it worked anyway, but the expression is zero across 
 all 
  arrays when I access it. 
  
  Do you know what could be causing the problem? 
  
  Best, 
  Damian 
  
  
  The code I run is below: 
  
  library(aroma.affymetrix) 
  
  library(aroma.core) 
  
  setOption(aromaSettings, memory/ram, 500.0); 
  
  verbose - Arguments$getVerbose(-8, timestamp=TRUE) 
  
  chipType - HuEx-1_0-st-v2-core 
  
  cdf - AffymetrixCdfFile$byChipType(chipType) 
  
  #print(cdf) 
  
  cs - AffymetrixCelSet$byName(experiment1, cdf=cdf) 
  
  bc - RmaBackgroundCorrection(cs) 
  
  csBC - process(bc,verbose=verbose) 
  
  qn - QuantileNormalization(csBC, typesToUpdate=pm) 
  
  target - getTargetDistribution(qn, verbose=verbose) 
  
  qn - QuantileNormalization(csBC, typesToUpdate=pm, 
  targetDistribution=target) 
  
  csQN - process(qn, verbose=verbose) 
  
  csPLM - ExonRmaPlm(csQN, mergeGroups=TRUE) 
  
  fit(csPLM, verbose=verbose) 
  
  date() 
  
  ces - getChipEffectSet(csPLM) 
  
  gExprs - extractDataFrame(ces, units=1:3, addNames=TRUE) 
  
  
  sessionInfo() 
  R version 3.0.2 (2013-09-25) 
  Platform: x86_64-unknown-linux-gnu (64-bit) 
  
  locale: 
   [1] LC_CTYPE=C LC_NUMERIC=C 
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
   [9] LC_ADDRESS=C   LC_TELEPHONE=C 
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 
  
  attached base packages: 
  [1] stats graphics  grDevices utils datasets  methods   base 
  
  other attached packages: 
   [1] preprocessCore_1.23.0   aroma.light_1.31.8  matrixStats_0.8.14 
   [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0   R.devices_2.8.2 
   [7] R.filesets_2.3.0R.utils_1.29.8  R.oo_1.17.0 
  [10] affxparser_1.34.0   R.methodsS3_1.6.1 
  
  loaded via a namespace (and not attached): 
  [1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4DNAcopy_1.35.1

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

2014-03-07 Thread Henrik Bengtsson

On Fri, Mar 7, 2014 at 8:05 AM, Damian Plichta damian.plic...@gmail.com wrote:
 Hi Henrik,

 Lowering memory helped - it's drinks on me when we meet.

 It has been running for approximately 7 days now (ETA for unit type
 'expression': 20140320 23:20:26). With the updated affxparser can I speed
 it up by increasing the memory burden? The current memory allocation is
 approximately 3 Gbytes. I don't want to cancel the run if it means loosing
 the progress though.

PLM fitting is done in chunks of units.   When a new chunk starts the
estimates of the previous one are guaranteed to have been saved to
disk.  After that you can interrupts the processing at any point and
safely restart.  All previously processed chunks will be skipped.  The
current chunk that was interrupted will have to be redone from
scratch.

When you increase option memory/ram the chunks will be bigger, that
is, more units will be processed per chunk.  Given a fix memory/ram
setting, the number of units per chunk will go down as the number of
arrays increases, e.g. doubling the number of arrays will half the
number of units processed per chunk.

Increasing memory/ram makes a big difference particularly if there
are only a small number of units per chunk.  The there is a relatively
larger disk I/O overhead of reading probe intensities and storing
parameter estimates.  This is mainly because the file system can
impossibly cache the content of all 1000's arrays, i.e. it reads a few
units of one array, then goes to the next array and so on.  Also, the
more the probes are scattered on the array the more they are also
scattered in the CEL files, meaning when reading those units from one
file, the file system has to skip through a large portion of the
file (skipping is cheap, but it is still more efficient to read things
nearby rather than scattered and it is more likely that the file cache
will be successful).  When you increase memory/ram you read more
units and therefore you lower the fraction of skipped bytes versus
read ones.  This is what I believe brings the most speedup when
increasing memory/ram.  Eventually I *think* this payoff will be
relative small and there is little/no longer a need to increase
memory/ram.

So, yes, you can interrupt your script, update affxparser and increase
memory/ram, restart R and restart your script safely.After each
chunk is completed, there are some timing statistics on read, write,
and fitting overhead in addition to the ETA estimate.  Have a look at
those, to see if changing the settings makes a difference.  Please
report back to share you experience - there are some other user
benchmarks related to this on
http://aroma-project.org/howtos/ImproveProcessingTime

/Henrik



 Best,
 Damian


 On Tuesday, March 4, 2014 8:32:17 PM UTC-5, Henrik Bengtsson wrote:

 Did lowering memory/ram solve your problem?

 Also, an updated version of affxparser that no longer should overflow
 by the integer multiplication is available (on Bioconductor).

 Cheers,

 Henrik

 On Thu, Feb 27, 2014 at 12:36 PM, Henrik Bengtsson
 henrik.b...@aroma-project.org wrote:
  Congratulations Damian,
 
  I think your the first one to hit a limit of the Aroma Framework
  (remind me to by you a drink whenever you see me in person).
 
  I narrowed it down to the affxparser(*) package and I'll investigate
  further on how to fix this.  It should not occur and I'm confident
  that it can be avoided internally.  In the meanwhile, try to lower
  your 'memory/ram' setting, e.g. setOption(aromaSettings, memory/ram,
  10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a
  good clue (for me) on what's causing it.
 
  /Henrik
 
  DETAILS: The below illustrates the issue in affxparser::readCelUnits():
 
  .Machine$integer.max
  [1] 2147483647
  nbrOfArrays - 5622L
  .Machine$integer.max / nbrOfArrays
  [1] 381978.6
  nbrOfCells - 381978L
  nbrOfCells * nbrOfArrays
  [1] 2147480316
  nbrOfCells - 381979L
  nbrOfCells * nbrOfArrays
  [1] NA
  Warning message:
  In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
 
  By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively
  becomes smaller.
 
 
  On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta
  damian@gmail.com wrote:
  Hi Henrik,
 
  Thank you, that was helpful.
 
  I run to another problem though. I am trying to perform
  ExonRmaPlm(csQN,
  merge=TRUE) but this produces a following error:
 
  20140226 23:25:33|   Identifying CDF cell indices...done
  Error in vector(double, nbrOfCells * nbrOfArrays) :
vector size cannot be NA
  In addition: Warning message:
  In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
  20140226 23:28:35|  Reading probe intensities from 5622
  arrays...done
  20140226 23:28:35| Fitting chunk #1 of 1 of 'expression' units
  (code=1)
  with various dimensions...done
  20140226 23:28:35|Unit dimension #3 (various dimensions) of
  3...done
  20140226 23:28:35|   Fitting the model by unit dimensions (at

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

2014-03-04 Thread Henrik Bengtsson

Did lowering memory/ram solve your problem?

Also, an updated version of affxparser that no longer should overflow
by the integer multiplication is available (on Bioconductor).

Cheers,

Henrik

On Thu, Feb 27, 2014 at 12:36 PM, Henrik Bengtsson
henrik.bengts...@aroma-project.org wrote:
 Congratulations Damian,

 I think your the first one to hit a limit of the Aroma Framework
 (remind me to by you a drink whenever you see me in person).

 I narrowed it down to the affxparser(*) package and I'll investigate
 further on how to fix this.  It should not occur and I'm confident
 that it can be avoided internally.  In the meanwhile, try to lower
 your 'memory/ram' setting, e.g. setOption(aromaSettings, memory/ram,
 10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a
 good clue (for me) on what's causing it.

 /Henrik

 DETAILS: The below illustrates the issue in affxparser::readCelUnits():

 .Machine$integer.max
 [1] 2147483647
 nbrOfArrays - 5622L
 .Machine$integer.max / nbrOfArrays
 [1] 381978.6
 nbrOfCells - 381978L
 nbrOfCells * nbrOfArrays
 [1] 2147480316
 nbrOfCells - 381979L
 nbrOfCells * nbrOfArrays
 [1] NA
 Warning message:
 In nbrOfCells * nbrOfArrays : NAs produced by integer overflow

 By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively
 becomes smaller.


 On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta
 damian.plic...@gmail.com wrote:
 Hi Henrik,

 Thank you, that was helpful.

 I run to another problem though. I am trying to perform ExonRmaPlm(csQN,
 merge=TRUE) but this produces a following error:

 20140226 23:25:33|   Identifying CDF cell indices...done
 Error in vector(double, nbrOfCells * nbrOfArrays) :
   vector size cannot be NA
 In addition: Warning message:
 In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
 20140226 23:28:35|  Reading probe intensities from 5622 arrays...done
 20140226 23:28:35| Fitting chunk #1 of 1 of 'expression' units (code=1)
 with various dimensions...done
 20140226 23:28:35|Unit dimension #3 (various dimensions) of 3...done
 20140226 23:28:35|   Fitting the model by unit dimensions (at least for the
 large classes)...done
 20140226 23:28:35|  Unit type #1 ('expression') of 1...done
 20140226 23:28:35| Fitting ExonRmaPlm for each unit type separately...done
 20140226 23:28:35|Fitting model of class ExonRmaPlm...done

 I testes whether it worked anyway, but the expression is zero across all
 arrays when I access it.

 Do you know what could be causing the problem?

 Best,
 Damian


 The code I run is below:

 library(aroma.affymetrix)

 library(aroma.core)

 setOption(aromaSettings, memory/ram, 500.0);

 verbose - Arguments$getVerbose(-8, timestamp=TRUE)

 chipType - HuEx-1_0-st-v2-core

 cdf - AffymetrixCdfFile$byChipType(chipType)

 #print(cdf)

 cs - AffymetrixCelSet$byName(experiment1, cdf=cdf)

 bc - RmaBackgroundCorrection(cs)

 csBC - process(bc,verbose=verbose)

 qn - QuantileNormalization(csBC, typesToUpdate=pm)

 target - getTargetDistribution(qn, verbose=verbose)

 qn - QuantileNormalization(csBC, typesToUpdate=pm,
 targetDistribution=target)

 csQN - process(qn, verbose=verbose)

 csPLM - ExonRmaPlm(csQN, mergeGroups=TRUE)

 fit(csPLM, verbose=verbose)

 date()

 ces - getChipEffectSet(csPLM)

 gExprs - extractDataFrame(ces, units=1:3, addNames=TRUE)


 sessionInfo()
 R version 3.0.2 (2013-09-25)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=C LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] preprocessCore_1.23.0   aroma.light_1.31.8  matrixStats_0.8.14
  [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0   R.devices_2.8.2
  [7] R.filesets_2.3.0R.utils_1.29.8  R.oo_1.17.0
 [10] affxparser_1.34.0   R.methodsS3_1.6.1

 loaded via a namespace (and not attached):
 [1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4DNAcopy_1.35.1
 [5] PSCBS_0.40.4R.cache_0.9.2   R.huge_0.6.0R.rsp_0.9.28
 [9] tools_3.0.2

 On Thursday, February 20, 2014 1:21:25 PM UTC-5, Henrik Bengtsson wrote:

 On Tue, Feb 18, 2014 at 7:30 PM, Damian Plichta
 damian@gmail.com wrote:
  Thanks, that helped a lot. It took me less than 3 hours to perform the
  background correction.
 
  Now I'm wondering if for the next step, quantile normalization, I could
  do a
  similar trick. Is there a way to precompute the target empirical
  distribution based on all arrays and then do the normalization on chunks
  of
  data (thus in an independent manner)? I can see the option
  targetDistribution under QuantileNormalization.

 # Calculate the target distribution based on *all* arrays [not
 parallalized]
 qn -

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

2014-02-27 Thread Henrik Bengtsson

Congratulations Damian,

I think your the first one to hit a limit of the Aroma Framework
(remind me to by you a drink whenever you see me in person).

I narrowed it down to the affxparser(*) package and I'll investigate
further on how to fix this.  It should not occur and I'm confident
that it can be avoided internally.  In the meanwhile, try to lower
your 'memory/ram' setting, e.g. setOption(aromaSettings, memory/ram,
10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a
good clue (for me) on what's causing it.

/Henrik

DETAILS: The below illustrates the issue in affxparser::readCelUnits():

 .Machine$integer.max
[1] 2147483647
 nbrOfArrays - 5622L
 .Machine$integer.max / nbrOfArrays
[1] 381978.6
 nbrOfCells - 381978L
 nbrOfCells * nbrOfArrays
[1] 2147480316
 nbrOfCells - 381979L
 nbrOfCells * nbrOfArrays
[1] NA
Warning message:
In nbrOfCells * nbrOfArrays : NAs produced by integer overflow

By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively
becomes smaller.


On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta
damian.plic...@gmail.com wrote:
 Hi Henrik,

 Thank you, that was helpful.

 I run to another problem though. I am trying to perform ExonRmaPlm(csQN,
 merge=TRUE) but this produces a following error:

 20140226 23:25:33|   Identifying CDF cell indices...done
 Error in vector(double, nbrOfCells * nbrOfArrays) :
   vector size cannot be NA
 In addition: Warning message:
 In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
 20140226 23:28:35|  Reading probe intensities from 5622 arrays...done
 20140226 23:28:35| Fitting chunk #1 of 1 of 'expression' units (code=1)
 with various dimensions...done
 20140226 23:28:35|Unit dimension #3 (various dimensions) of 3...done
 20140226 23:28:35|   Fitting the model by unit dimensions (at least for the
 large classes)...done
 20140226 23:28:35|  Unit type #1 ('expression') of 1...done
 20140226 23:28:35| Fitting ExonRmaPlm for each unit type separately...done
 20140226 23:28:35|Fitting model of class ExonRmaPlm...done

 I testes whether it worked anyway, but the expression is zero across all
 arrays when I access it.

 Do you know what could be causing the problem?

 Best,
 Damian


 The code I run is below:

 library(aroma.affymetrix)

 library(aroma.core)

 setOption(aromaSettings, memory/ram, 500.0);

 verbose - Arguments$getVerbose(-8, timestamp=TRUE)

 chipType - HuEx-1_0-st-v2-core

 cdf - AffymetrixCdfFile$byChipType(chipType)

 #print(cdf)

 cs - AffymetrixCelSet$byName(experiment1, cdf=cdf)

 bc - RmaBackgroundCorrection(cs)

 csBC - process(bc,verbose=verbose)

 qn - QuantileNormalization(csBC, typesToUpdate=pm)

 target - getTargetDistribution(qn, verbose=verbose)

 qn - QuantileNormalization(csBC, typesToUpdate=pm,
 targetDistribution=target)

 csQN - process(qn, verbose=verbose)

 csPLM - ExonRmaPlm(csQN, mergeGroups=TRUE)

 fit(csPLM, verbose=verbose)

 date()

 ces - getChipEffectSet(csPLM)

 gExprs - extractDataFrame(ces, units=1:3, addNames=TRUE)


 sessionInfo()
 R version 3.0.2 (2013-09-25)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=C LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
  [1] preprocessCore_1.23.0   aroma.light_1.31.8  matrixStats_0.8.14
  [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0   R.devices_2.8.2
  [7] R.filesets_2.3.0R.utils_1.29.8  R.oo_1.17.0
 [10] affxparser_1.34.0   R.methodsS3_1.6.1

 loaded via a namespace (and not attached):
 [1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4DNAcopy_1.35.1
 [5] PSCBS_0.40.4R.cache_0.9.2   R.huge_0.6.0R.rsp_0.9.28
 [9] tools_3.0.2

 On Thursday, February 20, 2014 1:21:25 PM UTC-5, Henrik Bengtsson wrote:

 On Tue, Feb 18, 2014 at 7:30 PM, Damian Plichta
 damian@gmail.com wrote:
  Thanks, that helped a lot. It took me less than 3 hours to perform the
  background correction.
 
  Now I'm wondering if for the next step, quantile normalization, I could
  do a
  similar trick. Is there a way to precompute the target empirical
  distribution based on all arrays and then do the normalization on chunks
  of
  data (thus in an independent manner)? I can see the option
  targetDistribution under QuantileNormalization.

 # Calculate the target distribution based on *all* arrays [not
 parallalized]
 qn - QuantileNormalization(dsC, typesToUpdate=pm)
 target - getTargetDistribution(qn, verbose=verbose)

 # Normalize array by array toward the same target distribution [in chucks]
 dsCs - extract(dsC, 1:100)
 qn - QuantileNormalization(dsCs, typesToUpdate=pm,
 targetDistribution=target)
 csNs - process(qn, verbose=verbose)

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

2014-02-26 Thread Damian Plichta

Hi Henrik,

Thank you, that was helpful. 

I run to another problem though. I am trying to perform ExonRmaPlm(csQN, 
merge=TRUE) but this produces a following error:

20140226 23:25:33|   Identifying CDF cell indices...done
Error in vector(double, nbrOfCells * nbrOfArrays) :
  vector size cannot be NA
In addition: Warning message:
In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
20140226 23:28:35|  Reading probe intensities from 5622 arrays...done
20140226 23:28:35| Fitting chunk #1 of 1 of 'expression' units (code=1) 
with various dimensions...done
20140226 23:28:35|Unit dimension #3 (various dimensions) of 3...done
20140226 23:28:35|   Fitting the model by unit dimensions (at least for the 
large classes)...done
20140226 23:28:35|  Unit type #1 ('expression') of 1...done
20140226 23:28:35| Fitting ExonRmaPlm for each unit type separately...done
20140226 23:28:35|Fitting model of class ExonRmaPlm…done

I testes whether it worked anyway, but the expression is zero across all 
arrays when I access it.

Do you know what could be causing the problem?

Best,
Damian


The code I run is below:

library(aroma.affymetrix)

library(aroma.core)

setOption(aromaSettings, memory/ram, 500.0);

verbose - Arguments$getVerbose(-8, timestamp=TRUE)

chipType - HuEx-1_0-st-v2-core

cdf - AffymetrixCdfFile$byChipType(chipType)

#print(cdf)

cs - AffymetrixCelSet$byName(experiment1, cdf=cdf)

bc - RmaBackgroundCorrection(cs)  

csBC - process(bc,verbose=verbose)

qn - QuantileNormalization(csBC, typesToUpdate=pm) 

target - getTargetDistribution(qn, verbose=verbose) 

qn - QuantileNormalization(csBC, typesToUpdate=pm, 
targetDistribution=target)

csQN - process(qn, verbose=verbose)

csPLM - ExonRmaPlm(csQN, mergeGroups=TRUE)

fit(csPLM, verbose=verbose)

date()

ces - getChipEffectSet(csPLM)

gExprs - extractDataFrame(ces, units=1:3, addNames=TRUE)

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=C LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
 [1] preprocessCore_1.23.0   aroma.light_1.31.8  matrixStats_0.8.14
 [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0   R.devices_2.8.2
 [7] R.filesets_2.3.0R.utils_1.29.8  R.oo_1.17.0
[10] affxparser_1.34.0   R.methodsS3_1.6.1

loaded via a namespace (and not attached):
[1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4DNAcopy_1.35.1
[5] PSCBS_0.40.4R.cache_0.9.2   R.huge_0.6.0R.rsp_0.9.28
[9] tools_3.0.2

On Thursday, February 20, 2014 1:21:25 PM UTC-5, Henrik Bengtsson wrote:

 On Tue, Feb 18, 2014 at 7:30 PM, Damian Plichta 
 damian@gmail.com javascript: wrote: 
  Thanks, that helped a lot. It took me less than 3 hours to perform the 
  background correction. 
  
  Now I'm wondering if for the next step, quantile normalization, I could 
 do a 
  similar trick. Is there a way to precompute the target empirical 
  distribution based on all arrays and then do the normalization on chunks 
 of 
  data (thus in an independent manner)? I can see the option 
  targetDistribution under QuantileNormalization. 

 # Calculate the target distribution based on *all* arrays [not 
 parallalized] 
 qn - QuantileNormalization(dsC, typesToUpdate=pm) 
 target - getTargetDistribution(qn, verbose=verbose) 

 # Normalize array by array toward the same target distribution [in chucks] 
 dsCs - extract(dsC, 1:100) 
 qn - QuantileNormalization(dsCs, typesToUpdate=pm, 
 targetDistribution=target) 
 csNs - process(qn, verbose=verbose) 

 Hope this helps 

 /Henrik 

  
  Kind regards, 
  
  Damian Plichta 
  
  On Monday, February 17, 2014 4:03:54 PM UTC-5, Henrik Bengtsson wrote: 
  
  Hi. 
  
  On Sun, Feb 16, 2014 at 6:53 PM, Damian Plichta 
  damian@gmail.com wrote: 
   Hi, 
   
   I'm processing around 5500 affymetrix exon arrays. The 
   RmaBackgroundCorrection() is pretty slow, 1-2 minutes/array. I played 
   with 
   setOption(aromaSettings, memory/ram, X) and increased X up to 100 
 but 
   it 
   didn't have any effect on this stage of analysis. 
  
  If you don't notice any difference in processing time by changing 
  memory/ram from the default (1.0) to 100, then the memory is not 
  your bottleneck. 
   
   Any way to speed the process up? 
  
  If you haven't already, make sure to read How to: Improve processing 
  time: 
  
http://aroma-project.org/howtos/ImproveProcessingTime 
  
  If you have access to multiple machines on the same file system, you 
  can do poor mans parallel processing for the *background correction*, 
  because each array is corrected independently of the others.  You can 
  do this by

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

5 matches

Site Navigation

Mail list logo

Footer information