Hi, this certainly looks weird and it's very puzzling. My last guess it that your so called monocell CDF is corrupt (but I cannot see how that happened). Verify that you get the same:
library("aroma.affymetrix"); cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec")) cdfM <- getMonocellCdf(cdf); print(cdfM); ## AffymetrixCdfFile: ## Path: annotationData/chipTypes/GenomeWideSNP_6 ## Filename: GenomeWideSNP_6,Full,monocell.CDF ## File size: 416.27 MB (436492121 bytes) ## Chip type: GenomeWideSNP_6,Full,monocell ## RAM: 0.00MB ## File format: v4 (binary; XDA) ## Dimension: 1681x1681 ## Number of cells: 2825761 ## Number of units: 1881415 ## Cells per unit: 1.50 ## Number of QC units: 4 # ASSERT CORRECTNESS stopifnot(identical(getChecksum(cdfM), "3a797e24a62f4f2e5214a6bf819151da")); If you get an error at the latter step, then I recommend that you remove that file by doing: pathnameM <- getPathname(cdfM); file.remove(pathnameM); stopifnot(!isFile(pathnameM)); and then also clear the file cache for aroma.affymetrix by doing: path <- getCachePath(dirs="aroma.affymetrix"); removeDirectory(path, recursive=TRUE); Then regenerate the monocell CDF (it will take a long time): cdfM <- getMonocellCdf(cdf, verbose=-10); and again compare with the output I get, particularly: stopifnot(identical(getChecksum(cdfM), "3a797e24a62f4f2e5214a6bf819151da")); If get to this point, it means that was the problem and you should be able to start your analysis. I'm really sorry that you get hit by this - it's really odd and it's extremely rare that someone has to got through all this mess to get things working - it should just work out of the box and hopefully it will do after this. /Henrik On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache <ioana.cutcuta...@gmail.com> wrote: > I am still not able to get same as you for units (so if I continue doing > units2 <- fit(plm, verbose=-10) I am not getting the right thing either). > I ran everything again and here is the output of the last part: > >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) >> units1 <- fitCnProbes(plm, verbose=verbose); > Estimating single-probe CN units... > Getting chip-effect set... > Getting chip-effect set from data set... > Getting chip-effect set from data set...done > Getting chip-effect set...done > Identifying CN units... > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 > 935598 935599 ... > Identifying CN units...done > Identifying subset to fit... > Identifying non-fitted units in chip-effect file... > Pathname: > plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL > Found indices cached on file > Reading data for these 1881415 cells... > Reading data for these 1881415 cells...done > Looking for stdvs <= 0 indicating non-estimated units: > int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... > Identifying non-fitted units in chip-effect file...done > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 > 935598 935599 ... > Identifying subset to fit...done > Identifying cell indices for CN units... > Identifying cell indices for CN units...done > Keeping only single-cell units... > Single-cell units: > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 > 935598 935599 ... > Cell indices: > int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 > 3660159 3524469 3264141 ... > Keeping only single-cell units...done > Identifying cell indices for estimates... > int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060 > 1876061 1876062 1876063 ... > Identifying cell indices for estimates...done > Fitting 1 arrays... > Array #1 ('91228050T') of 1... > Reading signals... > num [1:945826] 6448 7260 7878 6151 12230 ... > Reading signals...done > Transforming signals to estimates... > 'data.frame': 945826 obs. of 4 variables: > $ cell : int 1876054 1876055 1876056 1876057 1876058 1876059 > 1876060 1876061 1876062 1876063 ... > $ theta : num 6448 7260 7878 6151 12230 ... > $ sdTheta : num 1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ... > $ outliers: logi FALSE FALSE FALSE FALSE FALSE FALSE ... > Transforming signals to estimates...done > Writing estimates... > Storing flat data to file... > Updating file... > Updating file...done > Storing flat data to file...done > Writing estimates...done > Array #1 ('91228050T') of 1...done > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 483738 25.9 1733216 92.6 12913475 689.7 > Vcells 28397870 216.7 83594476 637.8 130588565 996.4 > Fitting 1 arrays...done > Estimating single-probe CN units...done >> str(units1); > int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 > 935598 935599 ... >> units <- findUnitsTodo(plm); >> str(units); > int [1:935589] 1 2 3 4 5 6 7 8 9 10 ... > > On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote: >> >> On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache >> <ioana.cu...@gmail.com> wrote: >> > Ok, so the first part ran without any errors, output below: >> > >> >> print(csN); >> > AffymetrixCelSet: >> > Name: test >> > Tags: ACC,ra,-XY,BPN,-XY >> > Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6 >> > Platform: Affymetrix >> > Chip type: GenomeWideSNP_6,Full >> > Number of arrays: 1 >> > Names: 91228050T [1] >> > Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51 >> > Total file size: 65.83MB >> > RAM: 0.01MB >> >> print(getFullNames(csN)); >> > [1] "91228050T" >> >> cdf2 <- getCdf(csN); >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT >> >> >> > >> > However I am not getting same as you for the units: >> > >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); >> >> units <- findUnitsTodo(plm); >> >> str(units); >> > int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... >> >> Alright (but has nothing to do with the error you first reported). If so, >> do: >> >> units1 <- fitCnProbes(plm, verbose=verbose); >> str(units1); >> ## int [1:945826] 935590 935591 935592 935593 935594 935595 935596 >> 935597 935598 935599 ... >> >> and you should get the same as: >> >> units <- findUnitsTodo(plm); >> str(units); >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... >> >> If so, the try with >> >> units2 <- fit(plm, verbose=-10); >> >> and compare to the output I already pasted in my previous message (below). >> >> /Henrik >> >> > >> > >> > On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote: >> >> >> >> Hmm... first run this in a fresh R session: >> >> >> >> library("aroma.affymetrix"); >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf); >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2"); >> >> csC <- process(acc, verbose=verbose); >> >> bpn <- BasePositionNormalization(csC, target="zero"); >> >> csN <- process(bpn, verbose=verbose); >> >> >> >> then what's the output of: >> >> print(csN); >> >> print(getFullNames(csN)); >> >> cdf2 <- getCdf(csN); >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT >> >> >> >> If you get an error above, what is the output of print(cdf2) and >> >> getChecksum(cdf2)? If not, continue: >> >> >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); >> >> units <- findUnitsTodo(plm); >> >> str(units); >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... >> >> >> >> Do you get the same? If so, continue: >> >> >> >> units2 <- fit(plm, verbose=-10); >> >> >> >> You should get something like the following, except from the >> >> dataset/sample names (if not, let me know): >> >> >> >> ## Fitting model of class AvgCnPlm... >> >> ## AvgCnPlm: >> >> ## Data set: GSE13372 >> >> ## Chip type: GenomeWideSNP_6,Full >> >> ## Input tags: testset,ACC,ra,-XY,BPN,-XY >> >> ## Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B >> >> ## Parameters: {probeModel: chr "pm", shift: num 0, flavor: chr >> >> "median", mergeStrands: logi TRUE, combineAlleles: logi TRUE} >> >> ## Path: >> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6 >> >> ## RAM: 0.01MB >> >> ## Identifying non-estimated units... >> >> ## Identifying non-fitted units in chip-effect file... >> >> ## Pathname: >> >> >> >> >> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL >> >> ## Found indices cached on file >> >> ## Reading data for these 1881415 cells... >> >> ## Reading data for these 1881415 cells...done >> >> ## Looking for stdvs <= 0 indicating non-estimated units: >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... >> >> ## Identifying non-fitted units in chip-effect file...done >> >> ## Identifying non-estimated units...done >> >> ## Getting model fit for 934968 units. >> >> ## Identifying unit types:... >> >> ## Units: >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... >> >> ## Unit types: >> >> ## atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ... >> >> ## - attr(*, "types")= Named int [1:8] 0 1 2 3 4 5 6 7 >> >> ## ..- attr(*, "names")= chr [1:8] "unknown" "expression" >> >> "genotyping" "resequencing" ... >> >> ## unitTypes >> >> ## 2 >> >> ## 934968 >> >> ## Unique unit types: >> >> ## genotyping >> >> ## 2 >> >> ## Identifying unit types:...done >> >> ## Setting up parameter sets... >> >> ## Creating CEL file... >> >> ## Chip type: GenomeWideSNP_6,Full >> >> ## Pathname: >> >> >> >> >> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL >> >> ## Returning already existing file. >> >> ## Creating CEL file...done >> >> ## used (Mb) gc trigger (Mb) max used (Mb) >> >> ## Ncells 549305 29.4 1606180 85.8 13015638 695.2 >> >> ## Vcells 7692698 58.7 23004541 175.6 40154683 306.4 >> >> ## Setting up parameter sets...done >> >> ## Fitting AvgCnPlm for each unit type separately... >> >> ## Unit types: >> >> ## genotyping >> >> ## 2 >> >> ## Unit type #1 ('genotyping') of 1... >> >> ## Unit type: genotyping (code=2) >> >> ## Number of units of this type: 934968 >> >> ## int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... >> >> ## Fitting the model by unit dimensions (at least for the large >> >> classes)... >> >> ## Grouping units into equivalent (unit,group,cell) dimensions... >> >> [...] >> >> >> >> What is particular interesting are the outputs of your unit types and >> >> unit indices. >> >> >> >> /Henrik >> >> >> >> On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache >> >> <ioana.cu...@gmail.com> wrote: >> >> >> library("aroma.affymetrix") >> >> >> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE) >> >> >> options(digits=4) >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") >> >> >> print(cdf) >> >> >> gi <- getGenomeInformation(cdf) >> >> >> print(gi) >> >> >> si <- getSnpInformation(cdf) >> >> >> print(si) >> >> >> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, >> >> >> fullname=FALSE)) >> >> >> print(acs) >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") >> >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf) >> >> >> print(csR) >> >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2") >> >> >> print(acc) >> >> >> csC <- process(acc, verbose=verbose) >> >> >> print(csC) >> >> >> bpn <- BasePositionNormalization(csC, target="zero") >> >> >> print(bpn) >> >> >> csN <- process(bpn, verbose=verbose) >> >> >> print(csN) >> > >> > -- >> > When reporting problems on aroma.affymetrix, make sure 1) to run the >> > latest >> > version of the package, 2) to report the output of sessionInfo() and >> > traceback(), and 3) to post a complete code example. >> > >> > >> > You received this message because you are subscribed to the Google >> > Groups >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. >> > To post to this group, send email to aroma-af...@googlegroups.com >> > To unsubscribe and other options, go to >> > http://www.aroma-project.org/forum/ > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/