Re: [aroma.affymetrix] Combining data from multiple chip types

Henrik Bengtsson Fri, 02 Dec 2011 12:49:09 -0800

Hi.

Yes, the Aroma framework can handle this.

On Fri, Dec 2, 2011 at 12:19 PM, Steven McKinney <smckin...@bccrc.ca> wrote:
> Hi all,
>
> I am running an analysis on Affymetrix SNP6, 250K Nsp and 250K Sty chip types.
> For various reasons, patient samples were assessed either on SNP6 chips or
> on 500K chipsets (250K Nsp and 250K Sty).  To further complicate things,
> an occasional 250K Nsp chip processing failed, so some patients have data
> only on a 250K Sty chip.

Ok, so each sample is processed on either of:

1. GenomeWideSNP_6
2. Mapping250K_Nsp
3. Mapping250K_Sty & Mapping250K_Nsp

>
> I see on the web page
>
>   http://www.aroma-project.org/features
>
> the description
>
> COPY-NUMBER ANALYSIS:
> * Paired & non-paired copy-number analysis: All generations, i.e. 10K, 100K, 
> 500K, 5.0 & 6.0. CBS & GLAD * segmentation methods.  Combine data from 
> multiple chip types.
>
>
> My question is, at what point can data from multiple chip types be combined?
>
> As I start my aroma.affymetrix analytic pipeline (shown below), I first 
> process the
> GenomeWideSNP_6 chips, then the 250K Nsp, then the 250K Sty.  Is this 
> appropriate,
> or is there a way to combine processing of all chip types from the start?
>
> If not from the start, at what step can I combine data?

You can safely preprocess the different chip types independently.  For
simplicity, use doCRMAv2();

  http://aroma-project.org/blocks/doCRMAv2

Note argument 'plm'.   Also, as mention, if you are interested
allele-specific analysis (e.g. LOH), use doASCRMAv2() in place of
doCRMAv2().

It is for at the segmentation step you need to care about merging chip
types.  The segmentation model classes of the Aroma framework (e.g.
CbsModel), will take care of the merging by simply interweaving the
loci/total CN estimates from multiple chip types (if such are
available for the sample currently being segmented).  Using
do[AS]CRMAv2(), you will basically get an AromaUnitTotalCnBinarySet
for each chip type.  If you place those in an R list, e.g.

dsList <- list();
dsList[["GenomeWideSNP_6"]] <- doCRMAv2(..., chipType="GenomeWideSNP_6");
dsList[["Mapping250K_Nsp"]] <- doCRMAv2(...,
chipType="Mapping250K_Nsp", plm="RmaPlm");
dsList[["Mapping250K_Sty"]] <- doCRMAv2(...,
chipType="Mapping250K_Sty", plm="RmaPlm");

You can simply do

sm <- CbsModel(dsList);

and proceed as illustrated in vignette 'Total copy-number segmentation
(non-paired CBS)' [http://aroma-project.org/vignettes/NonPairedCBS].
This idea of merging chip types, is also used in vignette 'Vignette:
Total copy number analysis using CRMA v1 (10K, 100K, 500K)'
[http://aroma-project.org/vignettes/CRMAv1].

What you need to be careful about is how your array files are named,
because that is key for CbsModel to be able to identify which array
files map to the same sample/individual.  This is also mention in the
"CRMAv1" vignette.  Note that you do not physically have to rename
your array/CEL files.  Instead you can utilize so called full-name
translators, cf. how-to page 'How to: Use fullname translators to
rename data files'
[http://aroma-project.org/howtos/setFullNamesTranslator].  These can
be applied after doing preprocessing (e.g. CRMAv2), so you don't have
to worry about that until segmentation.

Potential problems: In the merging step, there is nothing specific
that is done to make sure that the CN estimates from the different
chip types to be merged are on the same scale, i.e. same observed CN
mean levels for the same underlying/true CN level.  It simply assumes
that this has been taken care of by the preprocessing method.  I'd
say, small discrepancies are alright because merging will still
increase the power to detect change points, which is the number one
objective of segmentation methods such as CBS.  If there are large
discrepancies (which I doubt you'll see), you may have to normalize CN
estimates to be one the same linear scale, cf. vignette 'MSCN:
Multi-source copy-number normalization'
[http://aroma-project.org/vignettes/MSCN].  As you can see in the MSCN
paper (Bengtsson et al. 2009; http://aroma-project.org/publications/),
bringing estimates on the same scale improves the power to detect
change points compared to not doing before merging.

Hope this helps get you started

Henrik

>
> Any advice, or pointers to documentation on this issue of combining data from 
> multiple chip types that
> I have not yet found, would be appreciated.
>
> Best
>
> Steve
>
>
> require("aroma.affymetrix")
>
> log <- verbose <- Arguments$getVerbose(-9, timestamp=TRUE)
> ## Don't display too many decimals.
> options(digits=5)
>
> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full")
> print(cdf)
>
> gi <- getGenomeInformation(cdf)
> print(gi)
>
> si <- getSnpInformation(cdf)
> print(si)
>
> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname = FALSE))
> print(acs)
>
> csR <- AffymetrixCelSet$byName("Primary", cdf = cdf)
> print(csR)
>
> cs <- csR
>
> par(mar = c(4, 4, 4, 1) + 0.1)
> plotDensity(cs, lwd = 2, ylim = c(-0.1, 0.80))
> stext(side = 3, pos = 0, getFullName(cs))
> filename <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs), getChipType(cs))
> dev.print(pdf, file = filename, width = 7, height = 5)
>
> ### 500K
>
>
> cdf5N <- AffymetrixCdfFile$byChipType("Mapping250K_Nsp")
> print(cdf5N)
>
> gi5N <- getGenomeInformation(cdf5N)
> print(gi5N)
>
> si5N <- getSnpInformation(cdf5N)
> print(si5N)
>
> acs5N <- AromaCellSequenceFile$byChipType(getChipType(cdf5N, fullname = 
> FALSE))
> print(acs5N)
>
> csR5N <- AffymetrixCelSet$byName("Primary", cdf = cdf5N)
> print(csR5N)
>
> cs5N <- csR5N
>
> par(mar = c(4, 4, 4, 1) + 0.1)
> plotDensity(cs5N, lwd = 2, ylim = c(-0.1, 0.80))
> stext(side = 3, pos = 0, getFullName(cs5N))
> filename5N <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs5N), 
> getChipType(cs5N))
> dev.print(pdf, file = filename5N, width = 7, height = 5)
>
> … etc…
>
>
>
> Steven McKinney, Ph.D.
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.

You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] Combining data from multiple chip types

Reply via email to