Re: [aroma.affymetrix] exon level data.frame dimensions

Henrik Bengtsson Mon, 11 Jul 2011 16:23:51 -0700

Hi.

On Mon, Jul 11, 2011 at 5:33 AM, Anita <anita.grigoria...@kcl.ac.uk> wrote:
> Dear group,
>
> I would like to extract a data.frame on the exon level and followed
> these commands (see below), surprisingly the data.frame had these
> dimensions:
>
> 326983 249
>
> while I expected to have a data.frame with
> 1.4 Mill 249 samples
>
> Could you please advise me how I can extract the data for all exons
> from this analysis?


You did indeed get all the data.  You are using a custom CDF which
defines (only) 22035 units (here "transcripts");

> cdf
AffymetrixCdfFile:
Path: annotationData/chipTypes/HuEx-1_0-st-v2
Filename: HuEx-1_0-st-v2,U-Ensembl49,G-Affy.cdf
Filesize: 44.04MB
Chip type: HuEx-1_0-st-v2,U-Ensembl49,G-Affy
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 2560x2560
Number of cells: 6553600
Number of units: 22035
Cells per unit: 297.42
Number of QC units: 1

Each unit contains one or more groups, which here corresponds to
"exons".  For instance, for unit #34 there are 16 groups:

> data <- readUnits(cdf, units=34);
> print(names(data));
[1] "ENSG00000003096"
> print(names(data[["ENSG00000003096"]]$groups));
 [1] "4019161" "4019162" "4019163" "4019164" "4019167"
 [6] "4019169" "4019170" "4019173" "4019174" "4019175"
[11] "4019176" "4019177" "4019179" "4019180" "4019196"
[16] "4019197"

In order words, the data frame that you get in the end *when using
this particular CDF* will contain 22035 transcripts * <avg number of
exons per transcript> which here becomes 326983 exons.  One way to get
to this count without processing all your data is to do:

> library("aroma.affymetrix");
> chipType <- "HuEx-1_0-st-v2";
> cdf <- AffymetrixCdfFile$byChipType(chipType,tags="U-Ensembl49,G-Affy");
> nbrOfGroupsPerUnit <- getUnitSizes(cdf);
> sum(nbrOfGroupsPerUnit);
[1] 326983

To conclude, it is important that you pick the custom CDF you want,
and that you understand the objectives it was created based upon.  If
you didn't understand the above about units and groups for Affymetrix
CDFs, I recommend that you try to read about that too (see for
instance the affxparser package).

Hope this helps

Henrik

>
>
> Many  thanks for your advise in advance,
>
> best wishes,
>
> Anita
>
>
>
>
> library(aroma.affymetrix)
> library(Biobase)
> library(limma)
> library(affy)
> library(biomaRt)
>
>
> ##############################################################################
> verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
> chipType <- "HuEx-1_0-st-v2"
> cdf <- AffymetrixCdfFile$byChipType(chipType,tags="U-Ensembl49,G-
> Affy")
> print(cdf)
>
> cs <- AffymetrixCelSet$byName("Affy_Exon_June2011", cdf=cdf)
> print(cs)
>
> setCdf(cs,cdf)
>
> bc <- RmaBackgroundCorrection(cs,tag="U-Ensembl49,G-Affy")
> csBC <- process(bc,verbose=verbose)
>
> qn <- QuantileNormalization(csBC,typesToUpdate="pm")
> print(qn)
>
> csN <- process(qn,verbose=verbose)
>
> getCdf(csN)
>
> plmEx <- ExonRmaPlm(csN,mergeGroups=FALSE)
> print(plmEx)
>
> fit(plmEx,verbose=verbose)
>
> cesEx <- getChipEffectSet(plmEx)
> exFit <- extractDataFrame(cesEx,units=NULL,addNames=TRUE)
> dim(exFit)
> #326983 - 249 - exon based data
> ##############################################################################
>
>
>
>
> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] biomaRt_2.6.0          aroma.affymetrix_2.1.0 aroma.apd_0.1.8
>  [4] affxparser_1.22.1      R.huge_0.2.2           aroma.core_2.1.0
>  [7] aroma.light_1.20.0     matrixStats_0.2.2      R.rsp_0.5.3
> [10] R.cache_0.4.2          R.filesets_1.0.1       digest_0.5.0
> [13] R.utils_1.7.5          R.oo_1.8.0             affy_1.28.1
> [16] R.methodsS3_1.2.1      limma_3.6.9            Biobase_2.10.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0         preprocessCore_1.12.0 RCurl_1.6-6
> [4] tcltk_2.12.1          tools_2.12.1          XML_3.4-0
>>
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] exon level data.frame dimensions

Reply via email to