Hi Henrik, Many thanks for your detailed response even when I didn't include some important information.
See comments below. On 3/11/10 23:00, "Henrik Bengtsson" <henrik.bengts...@gmail.com> wrote: Hi. On Wed, Nov 3, 2010 at 6:03 AM, Oscar Rueda <oscar.ru...@cancer.org.uk> wrote: > Hi all, > > I'm trying to normalize several thousands of tumor samples using aroma. The > BAF profiles I get using > >> AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=FALSE) > > are quite noisy, so I have started using ACNE for getting 'cleaner' ones. Yes, BAF profiles are quite noisy if not taking into account SNP-specific crosstalk effects, which ACNE and some other methods do. It is not clear from your message what chip type you are working on, but here I will assume GenomeWideSNP_6. Yes, This is GenomeWideSNP_6. > The problem is that I don't want to use the tumors as a reference, because > some of them have a lot of alterations. I understand this concern. Though, note that with a robust estimator together with the assumption that at any given SNP the majority of samples/arrays are normal/diploid, - (CA,CB) = (2,0), (1,1) or (0,2) - then the estimates of the crosstalk parameters should be ok. However, I agree that if you know a sample is not-diploid it is best to exclude it. The safe version of the latter is to only use "normal" samples. Before answering your specific ACNE-related questions/problems, are you aware of the TumorBoost method which is designed for the case where you have a matched normal for each tumor? Bengtsson, H.; Neuvial, P. & Speed, T. P. TumorBoost: Normalization of allele-specic tumor copy numbers from a single pair of tumor-normal genotyping microarrays BMC Bioinformatics, 2010. (Downloadable via http://aroma-project.org/publications/) Using that, you can normalize the tumor BAFs utilizing the match normal so that the normalized BAFs have much(!) greater signal-to-noise ratios. It won't/cannot normalize your normal samples - only the tumors - but if that is all you need it may be easier to use. It also needs a matched normal for *each* tumor; you cannot normalize tumors without a matched normal. Another advantage is that if you preprocess with AS-CRMAv2 and BAF normalize with TumorBoost you have a truly single-pair processing method, i.e. you can process your tumor-normal pairs one by one independently of all other pairs you have. You will find vignettes from TumorBoost at http://aroma-project.org/vignettes/. FYI, we are aware that the high-level version of TumorBoost for the aroma framework is still a bit tedious to apply. We simply haven't decided on the final design where to put the result files etc. Regardless, the results are still correct. Depending on what you are doing downstream, you might find the low-level/all-in-memory version much easier to use. Thanks for this! I remember reading that paper some time ago, so I'll definitely give it a try for the matched pairs. > With aroma I used the 'target=0' > option to get log intensities That option I do not follow?!? Where do you use 'target=0'? Even if you get signals that are on average near zero, I suspect that you are misinterpreting the results/output, which most likely are *not* log-intensities. Please show some code - actually show all you code to avoid ambiguities. This is the code I use: > library(aroma.affymetrix) > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") > gi <- getGenomeInformation(cdf) > si <- getSnpInformation(cdf) > geneinform <- getGenomeInformation(cdf) > snpinform <- getSnpInformation(cdf) > cs.comb<- AffymetrixCelSet$byName("Project1", chipType = > "GenomeWideSNP_6",cdf = cdf) > getFullName(cs.comb) > getPath(cs.comb) > getNames(cs.comb) > samplenames = getNames(cs.comb) > setCdf(cs.comb, cdf) > acc <- AllelicCrosstalkCalibration(cs.comb,model="CRMAv2") > csC <- process(acc, ram=28,verbose=verbose) > bpn <- BasePositionNormalization(csC, target="zero") > csN <- process(bpn, ram=28,verbose=verbose) > plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=FALSE) > if (length(findUnitsTodo(plm)) > 0) { > units <- fitCnProbes(plm, verbose=verbose) > str(units) > units <- fit(plm, ram=28, verbose=verbose) > str(units) > } > ces <- getChipEffectSet(plm) > fln <- FragmentLengthNormalization(ces, target="zero") > cesN <- process(fln, verbose=verbose) > cdf.mono <- getCdf(cesN); > for (kk in 1:24) { > units <- getUnitsOnChromosome(geneinform, chromosome=kk); > pos <- getPositions(geneinform, units=units); > unitNames <- getUnitNames(cdf.mono, units=units); > theta <- extractMatrix(cesN, units=1, field = "theta"); > theta <- colnames(theta) > Vals <- extractTheta(cesN, units=units) > Vals <- apply(Vals, c(1,2), cbind) > rownames(Vals) <- unitNames; > colnames(Vals) <- rep(theta, rep(2, length(theta))) > pos.log2I.val<-cbind(pos,Vals) > write.table(pos.log2I.val, > paste("Project1_log2I.pos_Chrm",kk,".txt",sep="")) > } So my understanding is that these are intensities, not ratios. I can obtain ratios later using a reference. > and then I normalized them against my pool of > normals or against them matched normal if available just subtracting the > values. I need to see your code to know what you are doing. > For ACNE it seems I have to specify the reference, so I created folders for > each of my matched pairs and used > >> NmfSnpPlm(csN, mergeStrands=TRUE, refs=c(FALSE, TRUE)) > > To indicate that the second array is the normal and the first the tumor. > > Is this the best way to proceed in my situation? Nope. When you find yourself having to do tedious "tricks" such as splitting up your data set to data sets containing a single tumor-normal pair take it as sign for "there must be a better way to do this" (which I guess is one reason why you posted this message). That's what I was thinking! :-) > > The former command produces the following values for chromosome 2: > >> summary(data) > total freqB > Min. : 0.0 Min. : NA > 1st Qu.: 0.0 1st Qu.: NA > Median : 905.6 Median : NA > Mean : 2345.2 Mean : NaN > 3rd Qu.: 4192.8 3rd Qu.: NA > Max. :53200.6 Max. : NA > NA's :153732 > > So it's clear that I'm doing something wrong. Does anyone know what is it? The problem is that you are asking the ACNE method to estimate the (SNP-specific) crosstalk parameters using only two arrays, actually only one since you specify that it is only the normal sample that should be used for the estimation. The model parameters are non-identifiable with only one sample, i.e. it is not possible to estimate the ACNE parameter. It should probably be added to the ACNE method/estimator (somewhere in the NmfSnpPlm class) to assert that not too few reference samples are used, instead of returning missing values. Instead, you should keep all your samples in the same directory and process the whole data set at once. You can specify which samples should be used as the reference set by using argument 'refs' to NmfSnpPlm, just as above. For example: refs <- seq(from=2, to=length(csN), by=2); nmf <- NmfSnpPlm(csN, mergeStrands=TRUE, refs=refs); Note that you might use a more clever approach to identify which samples are normals. For example, if the normal samples contain the tag "N", you can do: refs <- which(sapply(csN, hasTag, "N")); Hope this solves your problem. /Henrik -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ Thanks a lot for your answers. Here is the sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sfit_0.1.9 ACNE_0.4.2 MASS_7.3-7 [4] aroma.affymetrix_1.7.0 aroma.apd_0.1.7 affxparser_1.20.0 [7] R.huge_0.2.0 aroma.core_1.7.0 aroma.light_1.18.2 [10] matrixStats_0.2.2 R.rsp_0.4.0 R.cache_0.3.0 [13] R.filesets_0.9.0 digest_0.4.2 R.utils_1.5.3 [16] R.oo_1.7.4 R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] tools_2.11.1 Oscar M. Rueda, PhD Postdoc, Breast Cancer Functional Genomics Cancer Research UK Cambridge Research Institute Li Ka Shing Centre Robinson Way Cambridge CB2 0RE England This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a registered charity in England and Wales (1089464) and in Scotland (SC041666) and a company limited by guarantee registered in England and Wales under number 4325234. Our registered address is Angel Building, 407 St John Street, London, EC1V 4AD. Our central telephone number is 020 7242 0200. This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it. E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/