Hi Henrik, I just did what you suggested and seems it's all ok, the checksum is fine, output below. Any idea what else might be causing the problem?
> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); > # ASSERT CORRECTNESS > stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec")) > cdfM <- getMonocellCdf(cdf); > print(cdfM); AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full,monocell.CDF Filesize: 416.27MB Chip type: GenomeWideSNP_6,Full,monocell RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1681x1681 Number of cells: 2825761 Number of units: 1881415 Cells per unit: 1.50 Number of QC units: 4 > stopifnot(identical(getChecksum(cdfM), "3a797e24a62f4f2e5214a6bf819151da")); > getChecksum(cdfM) [1] "3a797e24a62f4f2e5214a6bf819151da" Ioana On Tuesday, 27 November 2012 21:40:54 UTC+2, Henrik Bengtsson wrote: > > Hi, > > this certainly looks weird and it's very puzzling. My last guess it > that your so called monocell CDF is corrupt (but I cannot see how that > happened). Verify that you get the same: > > library("aroma.affymetrix"); > > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); > # ASSERT CORRECTNESS > stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec")) > > cdfM <- getMonocellCdf(cdf); > print(cdfM); > ## AffymetrixCdfFile: > ## Path: annotationData/chipTypes/GenomeWideSNP_6 > ## Filename: GenomeWideSNP_6,Full,monocell.CDF > ## File size: 416.27 MB (436492121 bytes) > ## Chip type: GenomeWideSNP_6,Full,monocell > ## RAM: 0.00MB > ## File format: v4 (binary; XDA) > ## Dimension: 1681x1681 > ## Number of cells: 2825761 > ## Number of units: 1881415 > ## Cells per unit: 1.50 > ## Number of QC units: 4 > > # ASSERT CORRECTNESS > stopifnot(identical(getChecksum(cdfM), > "3a797e24a62f4f2e5214a6bf819151da")); > > > If you get an error at the latter step, then I recommend that you > remove that file by doing: > > pathnameM <- getPathname(cdfM); > file.remove(pathnameM); > stopifnot(!isFile(pathnameM)); > > and then also clear the file cache for aroma.affymetrix by doing: > > path <- getCachePath(dirs="aroma.affymetrix"); > removeDirectory(path, recursive=TRUE); > > Then regenerate the monocell CDF (it will take a long time): > > cdfM <- getMonocellCdf(cdf, verbose=-10); > > and again compare with the output I get, particularly: > > stopifnot(identical(getChecksum(cdfM), > "3a797e24a62f4f2e5214a6bf819151da")); > > If get to this point, it means that was the problem and you should be > able to start your analysis. > > > I'm really sorry that you get hit by this - it's really odd and it's > extremely rare that someone has to got through all this mess to get > things working - it should just work out of the box and hopefully it > will do after this. > > /Henrik > > On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache > <ioana.cu...@gmail.com <javascript:>> wrote: > > I am still not able to get same as you for units (so if I continue doing > > units2 <- fit(plm, verbose=-10) I am not getting the right thing > either). > > I ran everything again and here is the output of the last part: > > > >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) > >> units1 <- fitCnProbes(plm, verbose=verbose); > > Estimating single-probe CN units... > > Getting chip-effect set... > > Getting chip-effect set from data set... > > Getting chip-effect set from data set...done > > Getting chip-effect set...done > > Identifying CN units... > > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 > 935597 > > 935598 935599 ... > > Identifying CN units...done > > Identifying subset to fit... > > Identifying non-fitted units in chip-effect file... > > Pathname: > > > plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL > > > > Found indices cached on file > > Reading data for these 1881415 cells... > > Reading data for these 1881415 cells...done > > Looking for stdvs <= 0 indicating non-estimated units: > > int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... > > Identifying non-fitted units in chip-effect file...done > > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 > 935597 > > 935598 935599 ... > > Identifying subset to fit...done > > Identifying cell indices for CN units... > > Identifying cell indices for CN units...done > > Keeping only single-cell units... > > Single-cell units: > > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 > 935597 > > 935598 935599 ... > > Cell indices: > > int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 > > 3660159 3524469 3264141 ... > > Keeping only single-cell units...done > > Identifying cell indices for estimates... > > int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 > 1876060 > > 1876061 1876062 1876063 ... > > Identifying cell indices for estimates...done > > Fitting 1 arrays... > > Array #1 ('91228050T') of 1... > > Reading signals... > > num [1:945826] 6448 7260 7878 6151 12230 ... > > Reading signals...done > > Transforming signals to estimates... > > 'data.frame': 945826 obs. of 4 variables: > > $ cell : int 1876054 1876055 1876056 1876057 1876058 1876059 > > 1876060 1876061 1876062 1876063 ... > > $ theta : num 6448 7260 7878 6151 12230 ... > > $ sdTheta : num 1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ... > > $ outliers: logi FALSE FALSE FALSE FALSE FALSE FALSE ... > > Transforming signals to estimates...done > > Writing estimates... > > Storing flat data to file... > > Updating file... > > Updating file...done > > Storing flat data to file...done > > Writing estimates...done > > Array #1 ('91228050T') of 1...done > > used (Mb) gc trigger (Mb) max used (Mb) > > Ncells 483738 25.9 1733216 92.6 12913475 689.7 > > Vcells 28397870 216.7 83594476 637.8 130588565 996.4 > > Fitting 1 arrays...done > > Estimating single-probe CN units...done > >> str(units1); > > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 > > 935598 935599 ... > >> units <- findUnitsTodo(plm); > >> str(units); > > int [1:935589] 1 2 3 4 5 6 7 8 9 10 ... > > > > On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote: > >> > >> On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache > >> <ioana.cu...@gmail.com> wrote: > >> > Ok, so the first part ran without any errors, output below: > >> > > >> >> print(csN); > >> > AffymetrixCelSet: > >> > Name: test > >> > Tags: ACC,ra,-XY,BPN,-XY > >> > Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6 > >> > Platform: Affymetrix > >> > Chip type: GenomeWideSNP_6,Full > >> > Number of arrays: 1 > >> > Names: 91228050T [1] > >> > Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51 > >> > Total file size: 65.83MB > >> > RAM: 0.01MB > >> >> print(getFullNames(csN)); > >> > [1] "91228050T" > >> >> cdf2 <- getCdf(csN); > >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # > IMPORTANT > >> >> > >> > > >> > However I am not getting same as you for the units: > >> > > >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); > >> >> units <- findUnitsTodo(plm); > >> >> str(units); > >> > int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... > >> > >> Alright (but has nothing to do with the error you first reported). If > so, > >> do: > >> > >> units1 <- fitCnProbes(plm, verbose=verbose); > >> str(units1); > >> ## int [1:945826] 935590 935591 935592 935593 935594 935595 935596 > >> 935597 935598 935599 ... > >> > >> and you should get the same as: > >> > >> units <- findUnitsTodo(plm); > >> str(units); > >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... > >> > >> If so, the try with > >> > >> units2 <- fit(plm, verbose=-10); > >> > >> and compare to the output I already pasted in my previous message > (below). > >> > >> /Henrik > >> > >> > > >> > > >> > On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote: > >> >> > >> >> Hmm... first run this in a fresh R session: > >> >> > >> >> library("aroma.affymetrix"); > >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); > >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf); > >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2"); > >> >> csC <- process(acc, verbose=verbose); > >> >> bpn <- BasePositionNormalization(csC, target="zero"); > >> >> csN <- process(bpn, verbose=verbose); > >> >> > >> >> then what's the output of: > >> >> print(csN); > >> >> print(getFullNames(csN)); > >> >> cdf2 <- getCdf(csN); > >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # > IMPORTANT > >> >> > >> >> If you get an error above, what is the output of print(cdf2) and > >> >> getChecksum(cdf2)? If not, continue: > >> >> > >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); > >> >> units <- findUnitsTodo(plm); > >> >> str(units); > >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... > >> >> > >> >> Do you get the same? If so, continue: > >> >> > >> >> units2 <- fit(plm, verbose=-10); > >> >> > >> >> You should get something like the following, except from the > >> >> dataset/sample names (if not, let me know): > >> >> > >> >> ## Fitting model of class AvgCnPlm... > >> >> ## AvgCnPlm: > >> >> ## Data set: GSE13372 > >> >> ## Chip type: GenomeWideSNP_6,Full > >> >> ## Input tags: testset,ACC,ra,-XY,BPN,-XY > >> >> ## Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B > >> >> ## Parameters: {probeModel: chr "pm", shift: num 0, flavor: chr > >> >> "median", mergeStrands: logi TRUE, combineAlleles: logi TRUE} > >> >> ## Path: > >> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6 > >> >> ## RAM: 0.01MB > >> >> ## Identifying non-estimated units... > >> >> ## Identifying non-fitted units in chip-effect file... > >> >> ## Pathname: > >> >> > >> >> > >> >> > plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL > > > >> >> ## Found indices cached on file > >> >> ## Reading data for these 1881415 cells... > >> >> ## Reading data for these 1881415 cells...done > >> >> ## Looking for stdvs <= 0 indicating non-estimated units: > >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... > >> >> ## Identifying non-fitted units in chip-effect file...done > >> >> ## Identifying non-estimated units...done > >> >> ## Getting model fit for 934968 units. > >> >> ## Identifying unit types:... > >> >> ## Units: > >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... > >> >> ## Unit types: > >> >> ## atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ... > >> >> ## - attr(*, "types")= Named int [1:8] 0 1 2 3 4 5 6 7 > >> >> ## ..- attr(*, "names")= chr [1:8] "unknown" "expression" > >> >> "genotyping" "resequencing" ... > >> >> ## unitTypes > >> >> ## 2 > >> >> ## 934968 > >> >> ## Unique unit types: > >> >> ## genotyping > >> >> ## 2 > >> >> ## Identifying unit types:...done > >> >> ## Setting up parameter sets... > >> >> ## Creating CEL file... > >> >> ## Chip type: GenomeWideSNP_6,Full > >> >> ## Pathname: > >> >> > >> >> > >> >> > plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL > > > >> >> ## Returning already existing file. > >> >> ## Creating CEL file...done > >> >> ## used (Mb) gc trigger (Mb) max used (Mb) > >> >> ## Ncells 549305 29.4 1606180 85.8 13015638 695.2 > >> >> ## Vcells 7692698 58.7 23004541 175.6 40154683 306.4 > >> >> ## Setting up parameter sets...done > >> >> ## Fitting AvgCnPlm for each unit type separately... > >> >> ## Unit types: > >> >> ## genotyping > >> >> ## 2 > >> >> ## Unit type #1 ('genotyping') of 1... > >> >> ## Unit type: genotyping (code=2) > >> >> ## Number of units of this type: 934968 > >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... > >> >> ## Fitting the model by unit dimensions (at least for the large > >> >> classes)... > >> >> ## Grouping units into equivalent (unit,group,cell) > dimensions... > >> >> [...] > >> >> > >> >> What is particular interesting are the outputs of your unit types > and > >> >> unit indices. > >> >> > >> >> /Henrik > >> >> > >> >> On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache > >> >> <ioana.cu...@gmail.com> wrote: > >> >> >> library("aroma.affymetrix") > >> >> >> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE) > >> >> >> options(digits=4) > >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", > tags="Full") > >> >> >> print(cdf) > >> >> >> gi <- getGenomeInformation(cdf) > >> >> >> print(gi) > >> >> >> si <- getSnpInformation(cdf) > >> >> >> print(si) > >> >> >> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, > >> >> >> fullname=FALSE)) > >> >> >> print(acs) > >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", > tags="Full") > >> >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf) > >> >> >> print(csR) > >> >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2") > >> >> >> print(acc) > >> >> >> csC <- process(acc, verbose=verbose) > >> >> >> print(csC) > >> >> >> bpn <- BasePositionNormalization(csC, target="zero") > >> >> >> print(bpn) > >> >> >> csN <- process(bpn, verbose=verbose) > >> >> >> print(csN) > >> > > >> > -- > >> > When reporting problems on aroma.affymetrix, make sure 1) to run the > >> > latest > >> > version of the package, 2) to report the output of sessionInfo() and > >> > traceback(), and 3) to post a complete code example. > >> > > >> > > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. > >> > To post to this group, send email to aroma-af...@googlegroups.com > >> > To unsubscribe and other options, go to > >> > http://www.aroma-project.org/forum/ > > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > latest > > version of the package, 2) to report the output of sessionInfo() and > > traceback(), and 3) to post a complete code example. > > > > > > You received this message because you are subscribed to the Google > Groups > > "aroma.affymetrix" group with website http://www.aroma-project.org/. > > To post to this group, send email to > > aroma-af...@googlegroups.com<javascript:> > > To unsubscribe and other options, go to > http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/