Hi. On Mon, Dec 20, 2010 at 9:10 AM, James F. Reid <james.r...@ifom-ieo-campus.it> wrote: > Dear list, > > I am currently performing a meta-analysis of different datasets (GEO GSE > series) and have been writing sample information as dcf format files under > the 'annotationData/samples' directory however this slows down considerably > the processing time of running any type of analysis. Here is a short example > ('testData') involving 10 HG-U133_Plus_2 .CEL files where either no sample > annotation files are present in the directory or 11 sample annotation files > are present.
Yes, those sample annotation files do indeed slow things down. The reason is that for every data set loaded, *all* such files in annotationData/samples/ are parsed, and each entry of those files are checked against the data files in the data set you are setting up with ...$byName(). Thus, (i) the more annotation files and (ii) the more entries they have, the longer it will take. I am aware of this and that this is suboptimal. It is on my todo list to come up with a better/faster scheme, and your reports bumps up the priority a bit, but you shouldn't expect anything to soon. Instead... If your dataset do not need annotationData/samples/ annotation files, then exclude that directory. If you are linking to a global annotationData/ structure for your annotationData/chipTypes/ etc, then I recommend that you instead create a local annotationData/ directory and inside that link to annotationData/chipTypes/. This will give you the option to have a local annotationData/samples/ specific to each project. That gives you some flexibility. Hope this helps Henrik > > Is there a way I can reduce this overhead? > > Many thanks. > James > > remove ~/.Rcache, start R: > ## running with 0 sample annotation files > library("aroma.affymetrix") > cdf <- AffymetrixCdfFile$byChipType("HG-U133_Plus_2"); > > system.time(cs <- AffymetrixCelSet$byName("testData", cdf=cdf)); > user system elapsed > 0.450 0.020 0.527 > > exit R, remove ~/.Rcache, start R: > > ## running with sample annotations (11 files) > library("aroma.affymetrix") > cdf <- AffymetrixCdfFile$byChipType("HG-U133_Plus_2"); > > system.time(cs <- AffymetrixCelSet$byName("testData", cdf=cdf)); > user system elapsed > 16.74 0.10 17.40 > > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets tools methods > [8] base > > other attached packages: > [1] aroma.affymetrix_1.8.0 aroma.apd_0.1.7 affxparser_1.22.0 > [4] R.huge_0.2.0 aroma.core_1.8.1 aroma.light_1.18.2 > [7] matrixStats_0.2.2 R.rsp_0.4.0 R.cache_0.3.0 > [10] R.filesets_0.9.1 digest_0.4.2 R.utils_1.6.0 > [13] R.oo_1.7.4 R.methodsS3_1.2.1 > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/