Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Henrik Bengtsson Tue, 04 Mar 2014 17:33:06 -0800

Did lowering "memory/ram" solve your problem?

Also, an updated version of affxparser that no longer should overflow
by the integer multiplication is available (on Bioconductor).


Cheers,

Henrik

On Thu, Feb 27, 2014 at 12:36 PM, Henrik Bengtsson
<henrik.bengts...@aroma-project.org> wrote:
> Congratulations Damian,
>
> I think your the first one to hit a limit of the Aroma Framework
> (remind me to by you a drink whenever you see me in person).
>
> I narrowed it down to the affxparser(*) package and I'll investigate
> further on how to fix this.  It should not occur and I'm confident
> that it can be avoided internally.  In the meanwhile, try to lower
> your 'memory/ram' setting, e.g. setOption(aromaSettings, "memory/ram",
> 10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a
> good clue (for me) on what's causing it.
>
> /Henrik
>
> DETAILS: The below illustrates the issue in affxparser::readCelUnits():
>
>> .Machine$integer.max
> [1] 2147483647
>> nbrOfArrays <- 5622L
>> .Machine$integer.max / nbrOfArrays
> [1] 381978.6
>> nbrOfCells <- 381978L
>> nbrOfCells * nbrOfArrays
> [1] 2147480316
>> nbrOfCells <- 381979L
>> nbrOfCells * nbrOfArrays
> [1] NA
> Warning message:
> In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
>
> By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively
> becomes smaller.
>
>
> On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta
> <damian.plic...@gmail.com> wrote:
>> Hi Henrik,
>>
>> Thank you, that was helpful.
>>
>> I run to another problem though. I am trying to perform ExonRmaPlm(csQN,
>> merge=TRUE) but this produces a following error:
>>
>> 20140226 23:25:33|       Identifying CDF cell indices...done
>> Error in vector("double", nbrOfCells * nbrOfArrays) :
>>   vector size cannot be NA
>> In addition: Warning message:
>> In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
>> 20140226 23:28:35|      Reading probe intensities from 5622 arrays...done
>> 20140226 23:28:35|     Fitting chunk #1 of 1 of 'expression' units (code=1)
>> with various dimensions...done
>> 20140226 23:28:35|    Unit dimension #3 (various dimensions) of 3...done
>> 20140226 23:28:35|   Fitting the model by unit dimensions (at least for the
>> large classes)...done
>> 20140226 23:28:35|  Unit type #1 ('expression') of 1...done
>> 20140226 23:28:35| Fitting ExonRmaPlm for each unit type separately...done
>> 20140226 23:28:35|Fitting model of class ExonRmaPlm...done
>>
>> I testes whether it worked anyway, but the expression is zero across all
>> arrays when I access it.
>>
>> Do you know what could be causing the problem?
>>
>> Best,
>> Damian
>>
>>
>> The code I run is below:
>>
>> library(aroma.affymetrix)
>>
>> library(aroma.core)
>>
>> setOption(aromaSettings, "memory/ram", 500.0);
>>
>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
>>
>> chipType <- "HuEx-1_0-st-v2-core"
>>
>> cdf <- AffymetrixCdfFile$byChipType(chipType)
>>
>> #print(cdf)
>>
>> cs <- AffymetrixCelSet$byName("experiment1", cdf=cdf)
>>
>> bc <- RmaBackgroundCorrection(cs)
>>
>> csBC <- process(bc,verbose=verbose)
>>
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
>>
>> target <- getTargetDistribution(qn, verbose=verbose)
>>
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm",
>> targetDistribution=target)
>>
>> csQN <- process(qn, verbose=verbose)
>>
>> csPLM <- ExonRmaPlm(csQN, mergeGroups=TRUE)
>>
>> fit(csPLM, verbose=verbose)
>>
>> date()
>>
>> ces <- getChipEffectSet(csPLM)
>>
>> gExprs <- extractDataFrame(ces, units=1:3, addNames=TRUE)
>>
>>
>>> sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=C                 LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] preprocessCore_1.23.0   aroma.light_1.31.8      matrixStats_0.8.14
>>  [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0       R.devices_2.8.2
>>  [7] R.filesets_2.3.0        R.utils_1.29.8          R.oo_1.17.0
>> [10] affxparser_1.34.0       R.methodsS3_1.6.1
>>
>> loaded via a namespace (and not attached):
>> [1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4    DNAcopy_1.35.1
>> [5] PSCBS_0.40.4    R.cache_0.9.2   R.huge_0.6.0    R.rsp_0.9.28
>> [9] tools_3.0.2
>>
>> On Thursday, February 20, 2014 1:21:25 PM UTC-5, Henrik Bengtsson wrote:
>>>
>>> On Tue, Feb 18, 2014 at 7:30 PM, Damian Plichta
>>> <damian....@gmail.com> wrote:
>>> > Thanks, that helped a lot. It took me less than 3 hours to perform the
>>> > background correction.
>>> >
>>> > Now I'm wondering if for the next step, quantile normalization, I could
>>> > do a
>>> > similar trick. Is there a way to precompute the target empirical
>>> > distribution based on all arrays and then do the normalization on chunks
>>> > of
>>> > data (thus in an independent manner)? I can see the option
>>> > targetDistribution under QuantileNormalization.
>>>
>>> # Calculate the target distribution based on *all* arrays [not
>>> parallalized]
>>> qn <- QuantileNormalization(dsC, typesToUpdate="pm")
>>> target <- getTargetDistribution(qn, verbose=verbose)
>>>
>>> # Normalize array by array toward the same target distribution [in chucks]
>>> dsCs <- extract(dsC, 1:100)
>>> qn <- QuantileNormalization(dsCs, typesToUpdate="pm",
>>> targetDistribution=target)
>>> csNs <- process(qn, verbose=verbose)
>>>
>>> Hope this helps
>>>
>>> /Henrik
>>>
>>> >
>>> > Kind regards,
>>> >
>>> > Damian Plichta
>>> >
>>> > On Monday, February 17, 2014 4:03:54 PM UTC-5, Henrik Bengtsson wrote:
>>> >>
>>> >> Hi.
>>> >>
>>> >> On Sun, Feb 16, 2014 at 6:53 PM, Damian Plichta
>>> >> <damian....@gmail.com> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I'm processing around 5500 affymetrix exon arrays. The
>>> >> > RmaBackgroundCorrection() is pretty slow, 1-2 minutes/array. I played
>>> >> > with
>>> >> > setOption(aromaSettings, "memory/ram", X) and increased X up to 100
>>> >> > but
>>> >> > it
>>> >> > didn't have any effect on this stage of analysis.
>>> >>
>>> >> If you don't notice any difference in processing time by changing
>>> >> "memory/ram" from the default (1.0) to 100, then the memory is not
>>> >> your bottleneck.
>>> >> >
>>> >> > Any way to speed the process up?
>>> >>
>>> >> If you haven't already, make sure to read "How to: Improve processing
>>> >> time":
>>> >>
>>> >>   http://aroma-project.org/howtos/ImproveProcessingTime
>>> >>
>>> >> If you have access to multiple machines on the same file system, you
>>> >> can do poor mans parallel processing for the *background correction*,
>>> >> because each array is corrected independently of the others.  You can
>>> >> do this by processing a subset of arrays per computer, e.g.
>>> >>
>>> >> dsR <- AffymetrixCelSet$byName("MyDataSet", chipType="HuEx-1_0-st-v2")
>>> >> dsR <- extract(dsR, 1:100)
>>> >> bg <- RmaBackgroundCorrection(dsS)
>>> >> dsC <- process(bg, verbose=verbose)
>>> >>
>>> >> Repeat on another machine with 101:200, and so on.
>>> >>
>>> >> When all arrays have been background corrected, you can move back to
>>> >> your original script - all arrays background corrected are already
>>> >> saved to file and will therefore not be redone.
>>> >>
>>> >> /Henrik
>>> >>
>>> >> >
>>> >> > Kind regards,
>>> >> >
>>> >> > Damian Plichta
>>> >> >
>>> >> > --
>>> >> > --
>>> >> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>>> >> > latest
>>> >> > version of the package, 2) to report the output of sessionInfo() and
>>> >> > traceback(), and 3) to post a complete code example.
>>> >> >
>>> >> >
>>> >> > You received this message because you are subscribed to the Google
>>> >> > Groups
>>> >> > "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> >> > To post to this group, send email to aroma-af...@googlegroups.com
>>> >> > To unsubscribe and other options, go to
>>> >> > http://www.aroma-project.org/forum/
>>> >> >
>>> >> > ---
>>> >> > You received this message because you are subscribed to the Google
>>> >> > Groups
>>> >> > "aroma.affymetrix" group.
>>> >> > To unsubscribe from this group and stop receiving emails from it,
>>> >> > send
>>> >> > an
>>> >> > email to aroma-affymetr...@googlegroups.com.
>>> >> > For more options, visit https://groups.google.com/groups/opt_out.
>>> >
>>> > --
>>> > --
>>> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>>> > latest
>>> > version of the package, 2) to report the output of sessionInfo() and
>>> > traceback(), and 3) to post a complete code example.
>>> >
>>> >
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> > To post to this group, send email to aroma-af...@googlegroups.com
>>> > To unsubscribe and other options, go to
>>> > http://www.aroma-project.org/forum/
>>> >
>>> > ---
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "aroma.affymetrix" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to aroma-affymetr...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>> --
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
>> version of the package, 2) to report the output of sessionInfo() and
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to aroma-affymetrix+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Reply via email to