Hi. Yes, the Aroma framework can handle this.
On Fri, Dec 2, 2011 at 12:19 PM, Steven McKinney <smckin...@bccrc.ca> wrote: > Hi all, > > I am running an analysis on Affymetrix SNP6, 250K Nsp and 250K Sty chip types. > For various reasons, patient samples were assessed either on SNP6 chips or > on 500K chipsets (250K Nsp and 250K Sty). To further complicate things, > an occasional 250K Nsp chip processing failed, so some patients have data > only on a 250K Sty chip. Ok, so each sample is processed on either of: 1. GenomeWideSNP_6 2. Mapping250K_Nsp 3. Mapping250K_Sty & Mapping250K_Nsp > > I see on the web page > > http://www.aroma-project.org/features > > the description > > COPY-NUMBER ANALYSIS: > * Paired & non-paired copy-number analysis: All generations, i.e. 10K, 100K, > 500K, 5.0 & 6.0. CBS & GLAD * segmentation methods. Combine data from > multiple chip types. > > > My question is, at what point can data from multiple chip types be combined? > > As I start my aroma.affymetrix analytic pipeline (shown below), I first > process the > GenomeWideSNP_6 chips, then the 250K Nsp, then the 250K Sty. Is this > appropriate, > or is there a way to combine processing of all chip types from the start? > > If not from the start, at what step can I combine data? You can safely preprocess the different chip types independently. For simplicity, use doCRMAv2(); http://aroma-project.org/blocks/doCRMAv2 Note argument 'plm'. Also, as mention, if you are interested allele-specific analysis (e.g. LOH), use doASCRMAv2() in place of doCRMAv2(). It is for at the segmentation step you need to care about merging chip types. The segmentation model classes of the Aroma framework (e.g. CbsModel), will take care of the merging by simply interweaving the loci/total CN estimates from multiple chip types (if such are available for the sample currently being segmented). Using do[AS]CRMAv2(), you will basically get an AromaUnitTotalCnBinarySet for each chip type. If you place those in an R list, e.g. dsList <- list(); dsList[["GenomeWideSNP_6"]] <- doCRMAv2(..., chipType="GenomeWideSNP_6"); dsList[["Mapping250K_Nsp"]] <- doCRMAv2(..., chipType="Mapping250K_Nsp", plm="RmaPlm"); dsList[["Mapping250K_Sty"]] <- doCRMAv2(..., chipType="Mapping250K_Sty", plm="RmaPlm"); You can simply do sm <- CbsModel(dsList); and proceed as illustrated in vignette 'Total copy-number segmentation (non-paired CBS)' [http://aroma-project.org/vignettes/NonPairedCBS]. This idea of merging chip types, is also used in vignette 'Vignette: Total copy number analysis using CRMA v1 (10K, 100K, 500K)' [http://aroma-project.org/vignettes/CRMAv1]. What you need to be careful about is how your array files are named, because that is key for CbsModel to be able to identify which array files map to the same sample/individual. This is also mention in the "CRMAv1" vignette. Note that you do not physically have to rename your array/CEL files. Instead you can utilize so called full-name translators, cf. how-to page 'How to: Use fullname translators to rename data files' [http://aroma-project.org/howtos/setFullNamesTranslator]. These can be applied after doing preprocessing (e.g. CRMAv2), so you don't have to worry about that until segmentation. Potential problems: In the merging step, there is nothing specific that is done to make sure that the CN estimates from the different chip types to be merged are on the same scale, i.e. same observed CN mean levels for the same underlying/true CN level. It simply assumes that this has been taken care of by the preprocessing method. I'd say, small discrepancies are alright because merging will still increase the power to detect change points, which is the number one objective of segmentation methods such as CBS. If there are large discrepancies (which I doubt you'll see), you may have to normalize CN estimates to be one the same linear scale, cf. vignette 'MSCN: Multi-source copy-number normalization' [http://aroma-project.org/vignettes/MSCN]. As you can see in the MSCN paper (Bengtsson et al. 2009; http://aroma-project.org/publications/), bringing estimates on the same scale improves the power to detect change points compared to not doing before merging. Hope this helps get you started Henrik > > Any advice, or pointers to documentation on this issue of combining data from > multiple chip types that > I have not yet found, would be appreciated. > > Best > > Steve > > > require("aroma.affymetrix") > > log <- verbose <- Arguments$getVerbose(-9, timestamp=TRUE) > ## Don't display too many decimals. > options(digits=5) > > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full") > print(cdf) > > gi <- getGenomeInformation(cdf) > print(gi) > > si <- getSnpInformation(cdf) > print(si) > > acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname = FALSE)) > print(acs) > > csR <- AffymetrixCelSet$byName("Primary", cdf = cdf) > print(csR) > > cs <- csR > > par(mar = c(4, 4, 4, 1) + 0.1) > plotDensity(cs, lwd = 2, ylim = c(-0.1, 0.80)) > stext(side = 3, pos = 0, getFullName(cs)) > filename <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs), getChipType(cs)) > dev.print(pdf, file = filename, width = 7, height = 5) > > ### 500K > > > cdf5N <- AffymetrixCdfFile$byChipType("Mapping250K_Nsp") > print(cdf5N) > > gi5N <- getGenomeInformation(cdf5N) > print(gi5N) > > si5N <- getSnpInformation(cdf5N) > print(si5N) > > acs5N <- AromaCellSequenceFile$byChipType(getChipType(cdf5N, fullname = > FALSE)) > print(acs5N) > > csR5N <- AffymetrixCelSet$byName("Primary", cdf = cdf5N) > print(csR5N) > > cs5N <- csR5N > > par(mar = c(4, 4, 4, 1) + 0.1) > plotDensity(cs5N, lwd = 2, ylim = c(-0.1, 0.80)) > stext(side = 3, pos = 0, getFullName(cs5N)) > filename5N <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs5N), > getChipType(cs5N)) > dev.print(pdf, file = filename5N, width = 7, height = 5) > > … etc… > > > > Steven McKinney, Ph.D. > > Statistician > Molecular Oncology and Breast Cancer Program > British Columbia Cancer Research Centre > > email: smckinney +at+ bccrc +dot+ ca > > > BCCRC > Molecular Oncology > 675 West 10th Ave, Floor 4 > Vancouver B.C. > V5Z 1L3 > Canada > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/