Hi, I could reproduce the error with the data you've sent me offline. I've located the bug, which indeed was an edge-case bug, and I've fixed it. Update to PSCBS v0.35.0:
source("http://aroma-project.org/hbLite.R"); hbInstall("PSCBS"); I've verified that it works with your data. Below is the script I used. stopifnot(packageVersion("PSCBS") >= "0.35.0"); library("PSCBS"); library("R.devices"); devOptions("png", width=1024); setOption("devEval/args/force", FALSE); sampleName <- "ingrid"; ## Load the data from a previously saved segmentation object. pathname <- sprintf("%s.Rbin", sampleName); fit <- loadObject(pathname); data <- getLocusData(fit); rm(fit); ## Locate centromeres and other large gaps in data gaps <- findLargeGaps(data, minLength=1e6); knownSegments <- gapsToSegments(gaps); ## => dim(knownSegments) == c(64, 4) ## 'betaT' is already normalized using TumorBoost => tbn=FALSE. fit <- segmentByNonPairedPSCBS(data, knownSegments=knownSegments, avgDH="median", seed=0xBEEF, verbose=-10); printf("Number of segments: %d\n", nbrOfSegments(fit)); ## Number of segments: 568 toPNG(sampleName, tags="tracks,avgDH=median", aspectRatio=0.6, { plotTracks(fit); }); ## Estimate global background level in [0,1] (due to normal contamination and more) kappa <- estimateKappa(fit, verbose=-10); printf("Kappa: %g\n", kappa); ## Kappa: 0.374814 ## Call allelic balance ## (a) Estimate DH threshold for calling AB deltaAB <- estimateDeltaAB(fit, kappa=kappa); # If skipped, will be done internally printf("Delta_AB: %g\n", deltaAB); ## Delta_AB: 0.146787 ## (b) Call AB based on bootstrapped segment DH levels fit <- callAB(fit, delta=deltaAB, verbose=-100); printf("Number of AB segments: %d\n", sum(getSegments(fit)$abCall, na.rm=TRUE)); ## Number of AB segments: 193 toPNG(sampleName, tags="tracks,avgDH=median,AB", aspectRatio=0.6, { plotTracks(fit); }); ## Call loss of heterozygosity (LOH) ## (a) Estimate C_1 threshold for calling LOH deltaLOH <- estimateDeltaLOH(fit); # If skipped, will be done internally printf("Delta_LOH: %g\n", deltaLOH); ## Delta_LOH: 0.566545 ## (b) Call LOH based on bootstrapped segment C_1 levels fit <- callLOH(fit, delta=deltaLOH, verbose=-10); printf("Number of LOH segments: %d\n", sum(getSegments(fit)$lohCall, na.rm=TRUE)); ## Number of LOH segments: 149 toPNG(sampleName, tags="tracks,avgDH=median,AB+LOH", aspectRatio=0.6, { plotTracks(fit); }); ## Call NTCN ## (a) Estimate the threshold for calling neutral TCN segments ## By shrinking 'scale', more segments will be non-NTCN. deltaCN <- estimateDeltaCN(fit, scale=1.0, kappa=kappa); printf("Delta_CN: %g\n", deltaCN); ## Delta_CN: 0.312593 ## (b) Call NTCN based on bootstrapped segment TCN levels fit <- callNTCN(fit, delta=deltaCN, verbose=-20); printf("Number of NTCN segments: %d\n", sum(getSegments(fit)$ntcnCall, na.rm=TRUE)); ## Number of NTCN segments: 303 toPNG(sampleName, tags="tracks,avgDH=median,AB+LOH+NTCN", aspectRatio=0.6, { plotTracks(fit); }); /Henrik On Mon, Apr 22, 2013 at 6:00 PM, Henrik Bengtsson <henrik.bengts...@aroma-project.org> wrote: > Hi Ingrid. > > [cleaning up by dropping everything but the > bootstrapTCNandDHByRegion() error; will reply to the other stuff > separately.] > > On Sun, Apr 21, 2013 at 8:33 PM, Ingrid Lönnstedt > <ingrid.lonnst...@gmail.com> wrote: >> >> Hi Henrik, >> Thank you for your answers! You do a great job, indeed! I am now back to >> this. See a couple of remaining issues below. >> /Ingrid >> >> >> >> On Wednesday, 27 March 2013 18:16:07 UTC+11, Henrik Bengtsson wrote: >>> >>> On Tue, Mar 26, 2013 at 10:03 PM, Ingrid Lönnstedt >>> <ingrid.l...@gmail.com> wrote: >>> > >>> > Thank you Henrik! >>> > OK, I followed your instructions, and the same problem (the error in >>> > callAB()) remains, or rather, it now occurs already in sample one. Also, > > [snip] > >> So, my main problem that remains is insanity check failures in callAB(). I >> have followed my sample 1 into bootstrapTCNandDHByRegion.PairedPSCBS with >> debug() and concluded that I have 592 segments, and the first failure occurs >> with the 233rd segment. I have printed the CBS info for segments 220 to 233, >> in case that would help, as well as the current error message and session >> Info. If there is a failure with segment 233, is there a reason not to >> report the callAB outcomes of the other segments? What does the sanity >> check failure tell me, and could I do any fix to get round it or at least >> get something for the other segments? >> >> Thanks, >> Ingrid >> >> >> Browse[3]> segs[220:233,] >> >> chromosome tcnId dhId tcnStart tcnEnd tcnNbrOfLoci tcnMean >> 220 9 11 1 73187382 125406672 38303 2.1977136 >> 221 9 12 1 125406672 141091395 8913 2.1506696 >> 222 NA NA NA NA NA 0 NA >> 223 10 1 1 72760 17932882 15166 2.0003566 >> 224 10 2 1 17932882 17956182 2 4.5480683 >> 225 10 3 1 17956182 18179544 18 1.9775509 >> 226 10 4 1 18179544 18202780 3 0.6127961 >> 227 10 5 1 18202780 24376470 4403 2.0286301 >> 228 10 6 1 24376470 24378482 11 0.6265435 >> 229 10 7 1 24378482 39076222 10124 2.0268394 >> 230 10 8 1 39076223 42433539 0 NaN >> 231 10 9 1 42433540 86411230 28982 2.0375534 >> 232 10 10 1 86411230 89029244 1697 2.0558821 >> 233 10 11 1 89029244 89115851 21 2.5505440 >> tcnNbrOfSNPs tcnNbrOfHets dhNbrOfLoci dhStart dhEnd dhMean >> 220 18928 5290 5290 73187382 125406672 0.07756713 >> 221 4002 1101 1101 125406672 141091395 0.07943612 >> 222 NA NA 0 NA NA NA >> 223 8456 2087 2087 72760 17932882 0.07487577 >> 224 0 0 0 17932882 17956182 NA >> 225 4 1 1 17956182 18179544 0.04524338 >> 226 0 0 0 18179544 18202780 NA >> 227 2212 543 543 18202780 24376470 0.07859021 >> 228 0 0 0 24376470 24378482 NA >> 229 4971 1086 1086 24378482 39076222 0.07712847 >> 230 0 0 0 39076223 42433539 NA >> 231 13983 3576 3576 42433540 86411230 0.06990635 >> 232 858 192 192 86411230 89029244 0.06439847 >> 233 4 1 1 89029244 89115851 0.36474615 >> c1Mean c2Mean >> 220 1.0136216 1.184092 >> 221 0.9899144 1.160755 >> 222 NA NA >> 223 0.9252892 1.075067 >> 224 NA NA >> 225 0.9440399 1.033511 >> 226 NA NA >> 227 0.9345998 1.094030 >> 228 NA NA >> 229 0.9352562 1.091583 >> 230 NA NaN >> 231 0.9475577 1.089996 >> 232 0.9617432 1.094139 >> 233 0.8101214 1.740423 >> >> deltaCN <- estimateDeltaCN(fit, scale=1) #0.2601609 for my sample >> fit = callAB(fit, delta = deltaCN, verbose = -10) >> List of 7 >> $ nbrOfTCNs : int 21 >> $ tcnNbrOfLoci : int 21 >> $ mu : num 2.58 >> $ tcnMean : num 2.55 >> $ dMu : num 0.0248 >> $ abs(dMu) : num 0.0248 >> $ range(x[units]): num [1:2] 89029244 89114401 >> >> Error in bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn, >> : >> INTERNAL ERROR: Incorrect TCN mean! > > This is an internal error that the *implementation* of the PSCBS > algorithm is correct and handles all "edge cases"; apparently there is > still another case that was not considered. These "edge cases" are > for instance when two SNPs share the exact same location but the > internal CBS segmentation method wants to put a change point in > between them because that's what the data happens to say. In other > words, it's nothing you're doing wrong and probably nothing you should > try to work around either. (I'll think about this a bit more since it > happens once in a while and it would be nice to be able to keep going. > On the other hand, if it would be just a warning you would have never > told me and the error would never be fixed.) > > First, update to the latest PSCBS v0.34.8 (on CRAN since yday) and > verify that the error remains. If it does, could you save the > segmentation result; > > saveObject(fit, "problem.Rdata") > > and make 'problem.Rdata' available to me for download somewhere? Then > I can troubleshoot. > > /Henrik > >> >> >> > sessionInfo() >> R version 2.15.2 Patched (2013-02-09 r61879) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] DNAcopy_1.32.0 PSCBS_0.34.1 aroma.affymetrix_2.8.0 >> [4] affxparser_1.30.2 aroma.apd_0.2.3 R.huge_0.4.1 >> [7] aroma.light_1.28.0 aroma.core_2.8.0 matrixStats_0.7.0 >> [10] R.rsp_0.8.2 R.devices_2.1.3 R.cache_0.6.5 >> [13] R.filesets_2.0.0 R.utils_1.23.2 R.oo_1.13.0 >> [16] R.methodsS3_1.4.2 >> >> loaded via a namespace (and not attached): >> [1] calmate_0.10.0 digest_0.6.3 MASS_7.3-23 >> >> >>> >>> > All the best, >>> > Ingrid >>> > >>> > >>> > On Saturday, 23 March 2013 05:03:50 UTC+11, Henrik Bengtsson wrote: >>> >> >>> >> Hi Ingrid. >>> >> >>> >> On Wed, Mar 20, 2013 at 4:45 PM, Ingrid Lönnstedt >>> >> <ingrid.l...@gmail.com> wrote: >>> >> > >>> >> > Hi! >>> >> > >>> >> > I have segmented 56 SNP 6 tumour samples with >>> >> > segmentByNonPairedPSCBS(), >>> >> > and like to call allelic balance segments with callAB() as required, >>> >> > I >>> >> > understand, before I can call neutral TCNs with callNTCN(). >>> >> > > [snip] > >>> >> >>> >> source("http://aroma-project.org/hbLite.R"); >>> >> hbInstall("PSCBS"); >>> >> >>> >> >>> >> >>> >> > For the first >>> >> > tumour sample, both callAB() and callNTCN() seem to work fine, but >>> >> > then on >>> >> > the second sample I get errors with callAB(), see below. >>> >> >>> >> See comment after the error. >>> >> >>> >> > The plotTracks() >>> >> > image looks pretty much the same between sample one and two, and I >>> >> > have no >>> >> > reason to believe that sample two has a major problem in the >>> >> > segmentation. 4 >>> >> > of the 56 samples get very noisy segmentations, with lots of short >>> >> > regions >>> >> > scattered (has anybody seen that before?) > > [snip] > >>> >> > betaT = dataN[,'fracB',samp] >>> >> > ugp = getAromaUgpFile(dsNList$total) >>> >> > chromosome = ugp[,1,drop = T] >>> >> > x = ugp[, 2, drop = T] >>> >> > df = data.frame(chromosome = chromosome, x=x, CT=CT, betaT = >>> >> > betaT) >>> >> > df = dropSegmentationOutliers(df) >>> >> > #This gives warnings: >>> >> > #24: In DNAcopy::CNA(genomdat = yKK, chrom = chromosomeKK, >>> >> > ... : >>> >> > #array has repeated maploc positions >>> >> > gaps = findLargeGaps(df, minLength = 1e+06) >>> >> > knownSegments = gapsToSegments(gaps) >>> >> > >>> >> > fit = segmentByNonPairedPSCBS(df, knownSegments = knownSegments, >>> >> > avgDH="median", >>> >> > tauA = .33, >>> >> > seed = >>> >> > 48879, verbose = -10) >>> >> > #This gives warnings that there are multipe maploc positions >>> >> > fit = callAB(fit, verbose = -10) >>> >> > >>> >> > Calling segments of allelic balance from one-sided DH bootstrap >>> >> > confidence >>> >> > intervals... >>> >> > delta (offset adjusting for bias in DH): 0.211106073856354 >>> >> > alpha (CI quantile; significance level): 0.05 >>> >> > List of 7 >>> >> > $ nbrOfDHs : int 70 >>> >> > $ dhNbrOfLoci : int 70 >>> >> > $ mu : num 0.183 >>> >> > $ dhMean : num 0.203 >>> >> > $ dMu : num -0.02 >>> >> > $ abs(dMu) : num 0.02 >>> >> > $ range(x[units]): num [1:2] 990417 2792801 >>> >> > Error in bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = >>> >> > statsFcn, >>> >> > : >>> >> > INTERNAL ERROR: Incorrect DH mean! >>> >> > Calling segments of allelic balance from one-sided DH bootstrap >>> >> > confidence intervals...done >>> >> >>> >> This is a internal sanity check throwing an error. As a start, I >>> >> recommend to retry with (i) a proper CT calculate, and (ii) after >>> >> updated PSCBS. >>> >> >>> >> Hope this helps >>> >> >>> >> Henrik >>> >> >>> >> > >>> >> > >>> >> > > >>> >> > > head(getSegments(fit)) >>> >> > chromosome tcnId dhId tcnStart tcnEnd tcnNbrOfLoci tcnMean >>> >> > tcnNbrOfSNPs tcnNbrOfHets dhNbrOfLoci dhStart dhEnd >>> >> > 1 1 1 1 61736.0 811403.5 73 1.4735545 >>> >> > 15 2 2 61736.0 811403.5 >>> >> > 2 1 2 1 811403.5 2821394.0 527 0.9201369 >>> >> > 183 70 70 811403.5 2821394.0 >>> >> > 3 1 3 1 2821394.0 2827143.5 2 5.0273015 >>> >> > 2 0 0 2821394.0 2827143.5 >>> >> > 4 1 4 1 2827143.5 11694138.0 5531 0.9780667 >>> >> > 2940 983 983 2827143.5 11694138.0 >>> >> > 5 1 5 1 11694138.0 11709225.0 5 3.3949294 >>> >> > 5 3 3 11694138.0 11709225.0 >>> >> > 6 1 6 1 11709225.0 16923050.0 2918 1.0072282 >>> >> > 1485 550 550 11709225.0 16923050.0 >>> >> > dhMean c1Mean c2Mean >>> >> > 1 0.1695309 0.6118708 0.8616838 >>> >> > 2 0.2033105 0.3665317 0.5536052 >>> >> > 3 NA NA NA >>> >> > 4 0.2033807 0.3895734 0.5884933 >>> >> > 5 0.1465555 1.4486918 1.9462376 >>> >> > 6 0.2036166 0.4010699 0.6061583 >>> >> > >>> >> > >>> >> > > traceback() >>> >> > 7: stop("INTERNAL ERROR: Incorrect DH mean!") >>> >> > 6: bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn, >>> >> > ..., verbose = less(verbose, 50)) >>> >> > 5: bootstrapTCNandDHByRegion(fit, statsFcn = statsFcn, ..., verbose >>> >> > = >>> >> > less(verbose, >>> >> > 50)) >>> >> > 4: callAllelicBalanceByDH.PairedPSCBS(fit, ...) >>> >> > 3: callAllelicBalanceByDH(fit, ...) >>> >> > 2: callAB.PairedPSCBS(fit, verbose = -10) >>> >> > 1: callAB(fit, verbose = -10) >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > > sessionInfo() >>> >> > >>> >> > R version 2.15.2 Patched (2013-02-09 r61879) >>> >> > Platform: x86_64-unknown-linux-gnu (64-bit) >>> >> > >>> >> > locale: >>> >> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >> > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >> > LC_PAPER=C >>> >> > LC_NAME=C >>> >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >> > >>> >> > attached base packages: >>> >> > [1] splines stats graphics grDevices utils datasets >>> >> > methods >>> >> > base >>> >> > >>> >> > other attached packages: >>> >> > [1] Hmisc_3.10-1 survival_2.37-4 DNAcopy_1.32.0 >>> >> > PSCBS_0.30.0 aroma.affymetrix_2.8.0 >>> >> > [6] affxparser_1.30.2 aroma.apd_0.2.3 R.huge_0.4.1 >>> >> > aroma.light_1.28.0 aroma.core_2.8.0 >>> >> > [11] matrixStats_0.6.2 R.rsp_0.8.2 R.devices_2.1.3 >>> >> > R.cache_0.6.5 R.filesets_2.0.0 >>> >> > [16] crlmm_1.16.9 preprocessCore_1.20.0 >>> >> > oligoClasses_1.20.0 >>> >> > BiocGenerics_0.4.0 R.utils_1.19.5 >>> >> > [21] R.oo_1.11.7 R.methodsS3_1.4.2 >>> >> > >>> >> > loaded via a namespace (and not attached): >>> >> > [1] affyio_1.26.0 annotate_1.36.0 AnnotationDbi_1.20.5 >>> >> > Biobase_2.18.0 BiocInstaller_1.8.3 Biostrings_2.26.3 >>> >> > [7] bit_1.1-9 cluster_1.14.3 codetools_0.2-8 >>> >> > DBI_0.2-5 digest_0.6.3 ellipse_0.3-7 >>> >> > [13] ff_2.2-10 foreach_1.4.0 genefilter_1.40.0 >>> >> > GenomicRanges_1.10.7 grid_2.15.2 IRanges_1.16.6 >>> >> > [19] iterators_1.0.6 lattice_0.20-13 Matrix_1.0-11 >>> >> > mvtnorm_0.9-9994 parallel_2.15.2 RcppEigen_0.3.1.2.1 >>> >> > [25] RSQLite_0.11.2 stats4_2.15.2 tools_2.15.2 >>> >> > XML_3.95-0.1 xtable_1.7-1 zlibbioc_1.4.0 >>> >> > >>> >> > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > -- >>> >> > When reporting problems on aroma.affymetrix, make sure 1) to run the >>> >> > latest version of the package, 2) to report the output of >>> >> > sessionInfo() and >>> >> > traceback(), and 3) to post a complete code example. >>> >> > >>> >> > >>> >> > You received this message because you are subscribed to the Google >>> >> > Groups >>> >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. >>> >> > To post to this group, send email to aroma-af...@googlegroups.com >>> >> > To unsubscribe and other options, go to >>> >> > http://www.aroma-project.org/forum/ >>> >> > >>> >> > --- >>> >> > You received this message because you are subscribed to the Google >>> >> > Groups >>> >> > "aroma.affymetrix" group. >>> >> > To unsubscribe from this group and stop receiving emails from it, >>> >> > send an >>> >> > email to aroma-affymetr...@googlegroups.com. >>> >> > For more options, visit https://groups.google.com/groups/opt_out. >>> >> > >>> >> > >>> > >>> > -- >>> > -- >>> > When reporting problems on aroma.affymetrix, make sure 1) to run the >>> > latest version of the package, 2) to report the output of sessionInfo() >>> > and >>> > traceback(), and 3) to post a complete code example. >>> > >>> > >>> > You received this message because you are subscribed to the Google >>> > Groups "aroma.affymetrix" group with website >>> > http://www.aroma-project.org/. >>> > To post to this group, send email to aroma-af...@googlegroups.com >>> > To unsubscribe and other options, go to >>> > http://www.aroma-project.org/forum/ >>> > >>> > --- >>> > You received this message because you are subscribed to the Google >>> > Groups "aroma.affymetrix" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> > an email to aroma-affymetr...@googlegroups.com. >>> > For more options, visit https://groups.google.com/groups/opt_out. >>> > >>> > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.