Hi Henrik,

I just did what you suggested and seems it's all ok, the checksum is fine, 
output below.
Any idea what else might be causing the problem?

> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); 
> # ASSERT CORRECTNESS 
> stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec"))
> cdfM <- getMonocellCdf(cdf); 
> print(cdfM);
AffymetrixCdfFile:
Path: annotationData/chipTypes/GenomeWideSNP_6
Filename: GenomeWideSNP_6,Full,monocell.CDF
Filesize: 416.27MB
Chip type: GenomeWideSNP_6,Full,monocell
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 1681x1681
Number of cells: 2825761
Number of units: 1881415
Cells per unit: 1.50
Number of QC units: 4
> stopifnot(identical(getChecksum(cdfM), 
"3a797e24a62f4f2e5214a6bf819151da")); 
> getChecksum(cdfM)
[1] "3a797e24a62f4f2e5214a6bf819151da"

Ioana

On Tuesday, 27 November 2012 21:40:54 UTC+2, Henrik Bengtsson wrote:
>
> Hi, 
>
> this certainly looks weird and it's very puzzling.  My last guess it 
> that your so called monocell CDF is corrupt (but I cannot see how that 
> happened).  Verify that you get the same: 
>
> library("aroma.affymetrix"); 
>
> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); 
> # ASSERT CORRECTNESS 
> stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec")) 
>
> cdfM <- getMonocellCdf(cdf); 
> print(cdfM); 
> ## AffymetrixCdfFile: 
> ## Path: annotationData/chipTypes/GenomeWideSNP_6 
> ## Filename: GenomeWideSNP_6,Full,monocell.CDF 
> ## File size: 416.27 MB (436492121 bytes) 
> ## Chip type: GenomeWideSNP_6,Full,monocell 
> ## RAM: 0.00MB 
> ## File format: v4 (binary; XDA) 
> ## Dimension: 1681x1681 
> ## Number of cells: 2825761 
> ## Number of units: 1881415 
> ## Cells per unit: 1.50 
> ## Number of QC units: 4 
>
> # ASSERT CORRECTNESS 
> stopifnot(identical(getChecksum(cdfM), 
> "3a797e24a62f4f2e5214a6bf819151da")); 
>
>
> If you get an error at the latter step, then I recommend that you 
> remove that file by doing: 
>
> pathnameM <- getPathname(cdfM); 
> file.remove(pathnameM); 
> stopifnot(!isFile(pathnameM)); 
>
> and then also clear the file cache for aroma.affymetrix by doing: 
>
> path <- getCachePath(dirs="aroma.affymetrix"); 
> removeDirectory(path, recursive=TRUE); 
>
> Then regenerate the monocell CDF (it will take a long time): 
>
> cdfM <- getMonocellCdf(cdf, verbose=-10); 
>
> and again compare with the output I get, particularly: 
>
> stopifnot(identical(getChecksum(cdfM), 
> "3a797e24a62f4f2e5214a6bf819151da")); 
>
> If get to this point, it means that was the problem and you should be 
> able to start your analysis. 
>
>
> I'm really sorry that you get hit by this - it's really odd and it's 
> extremely rare that someone has to got through all this mess to get 
> things working - it should just work out of the box and hopefully it 
> will do after this. 
>
> /Henrik 
>
> On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache 
> <ioana.cu...@gmail.com <javascript:>> wrote: 
> > I am still not able to get same as you for units (so if I continue doing 
> > units2 <- fit(plm, verbose=-10) I am not getting the right thing 
> either). 
> > I ran everything again and here is the output of the last part: 
> > 
> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) 
> >> units1 <- fitCnProbes(plm, verbose=verbose); 
> > Estimating single-probe CN units... 
> >  Getting chip-effect set... 
> >   Getting chip-effect set from data set... 
> >   Getting chip-effect set from data set...done 
> >  Getting chip-effect set...done 
> >  Identifying CN units... 
> >    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 
> 935597 
> > 935598 935599 ... 
> >  Identifying CN units...done 
> >  Identifying subset to fit... 
> >   Identifying non-fitted units in chip-effect file... 
> >    Pathname: 
> > 
> plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL
>  
>
> >    Found indices cached on file 
> >    Reading data for these 1881415 cells... 
> >    Reading data for these 1881415 cells...done 
> >    Looking for stdvs <= 0 indicating non-estimated units: 
> >     int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... 
> >   Identifying non-fitted units in chip-effect file...done 
> >    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 
> 935597 
> > 935598 935599 ... 
> >  Identifying subset to fit...done 
> >  Identifying cell indices for CN units... 
> >  Identifying cell indices for CN units...done 
> >  Keeping only single-cell units... 
> >   Single-cell units: 
> >    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 
> 935597 
> > 935598 935599 ... 
> >   Cell indices: 
> >    int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106 
> > 3660159 3524469 3264141 ... 
> >  Keeping only single-cell units...done 
> >  Identifying cell indices for estimates... 
> >    int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 
> 1876060 
> > 1876061 1876062 1876063 ... 
> >  Identifying cell indices for estimates...done 
> >  Fitting 1 arrays... 
> >   Array #1 ('91228050T') of 1... 
> >    Reading signals... 
> >      num [1:945826] 6448 7260 7878 6151 12230 ... 
> >    Reading signals...done 
> >    Transforming signals to estimates... 
> >     'data.frame':       945826 obs. of  4 variables: 
> >      $ cell    : int  1876054 1876055 1876056 1876057 1876058 1876059 
> > 1876060 1876061 1876062 1876063 ... 
> >      $ theta   : num  6448 7260 7878 6151 12230 ... 
> >      $ sdTheta : num  1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ... 
> >      $ outliers: logi  FALSE FALSE FALSE FALSE FALSE FALSE ... 
> >    Transforming signals to estimates...done 
> >    Writing estimates... 
> >     Storing flat data to file... 
> >      Updating file... 
> >      Updating file...done 
> >     Storing flat data to file...done 
> >    Writing estimates...done 
> >   Array #1 ('91228050T') of 1...done 
> >              used  (Mb) gc trigger  (Mb)  max used  (Mb) 
> >   Ncells   483738  25.9    1733216  92.6  12913475 689.7 
> >   Vcells 28397870 216.7   83594476 637.8 130588565 996.4 
> >  Fitting 1 arrays...done 
> > Estimating single-probe CN units...done 
> >> str(units1); 
> >  int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597 
> > 935598 935599 ... 
> >> units <- findUnitsTodo(plm); 
> >> str(units); 
> >  int [1:935589] 1 2 3 4 5 6 7 8 9 10 ... 
> > 
> > On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote: 
> >> 
> >> On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache 
> >> <ioana.cu...@gmail.com> wrote: 
> >> > Ok, so the first part ran without any errors, output below: 
> >> > 
> >> >> print(csN); 
> >> > AffymetrixCelSet: 
> >> > Name: test 
> >> > Tags: ACC,ra,-XY,BPN,-XY 
> >> > Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6 
> >> > Platform: Affymetrix 
> >> > Chip type: GenomeWideSNP_6,Full 
> >> > Number of arrays: 1 
> >> > Names: 91228050T [1] 
> >> > Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51 
> >> > Total file size: 65.83MB 
> >> > RAM: 0.01MB 
> >> >> print(getFullNames(csN)); 
> >> > [1] "91228050T" 
> >> >> cdf2 <- getCdf(csN); 
> >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # 
> IMPORTANT 
> >> >> 
> >> > 
> >> > However I am not getting same as you for the units: 
> >> > 
> >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); 
> >> >> units <- findUnitsTodo(plm); 
> >> >> str(units); 
> >> >  int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ... 
> >> 
> >> Alright (but has nothing to do with the error you first reported).  If 
> so, 
> >> do: 
> >> 
> >> units1 <- fitCnProbes(plm, verbose=verbose); 
> >> str(units1); 
> >> ##  int [1:945826] 935590 935591 935592 935593 935594 935595 935596 
> >> 935597 935598 935599 ... 
> >> 
> >> and you should get the same as: 
> >> 
> >> units <- findUnitsTodo(plm); 
> >> str(units); 
> >> ##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 
> >> 
> >> If so, the try with 
> >> 
> >> units2 <- fit(plm, verbose=-10); 
> >> 
> >> and compare to the output I already pasted in my previous message 
> (below). 
> >> 
> >> /Henrik 
> >> 
> >> > 
> >> > 
> >> > On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote: 
> >> >> 
> >> >> Hmm... first run this in a fresh R session: 
> >> >> 
> >> >> library("aroma.affymetrix"); 
> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); 
> >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf); 
> >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2"); 
> >> >> csC <- process(acc, verbose=verbose); 
> >> >> bpn <- BasePositionNormalization(csC, target="zero"); 
> >> >> csN <- process(bpn, verbose=verbose); 
> >> >> 
> >> >> then what's the output of: 
> >> >> print(csN); 
> >> >> print(getFullNames(csN)); 
> >> >> cdf2 <- getCdf(csN); 
> >> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # 
> IMPORTANT 
> >> >> 
> >> >> If you get an error above, what is the output of print(cdf2) and 
> >> >> getChecksum(cdf2)?  If not, continue: 
> >> >> 
> >> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE); 
> >> >> units <- findUnitsTodo(plm); 
> >> >> str(units); 
> >> >> ##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 
> >> >> 
> >> >> Do you get the same?  If so, continue: 
> >> >> 
> >> >> units2 <- fit(plm, verbose=-10); 
> >> >> 
> >> >> You should get something like the following, except from the 
> >> >> dataset/sample names (if not, let me know): 
> >> >> 
> >> >> ## Fitting model of class AvgCnPlm... 
> >> >> ##  AvgCnPlm: 
> >> >> ##  Data set: GSE13372 
> >> >> ##  Chip type: GenomeWideSNP_6,Full 
> >> >> ##  Input tags: testset,ACC,ra,-XY,BPN,-XY 
> >> >> ##  Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B 
> >> >> ##  Parameters: {probeModel: chr "pm", shift: num 0, flavor: chr 
> >> >> "median", mergeStrands: logi TRUE, combineAlleles: logi TRUE} 
> >> >> ##  Path: 
> >> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6 
> >> >> ##  RAM: 0.01MB 
> >> >> ##  Identifying non-estimated units... 
> >> >> ##   Identifying non-fitted units in chip-effect file... 
> >> >> ##    Pathname: 
> >> >> 
> >> >> 
> >> >> 
> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL
>  
>
> >> >> ##    Found indices cached on file 
> >> >> ##    Reading data for these 1881415 cells... 
> >> >> ##    Reading data for these 1881415 cells...done 
> >> >> ##    Looking for stdvs <= 0 indicating non-estimated units: 
> >> >> ##     int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 
> >> >> ##   Identifying non-fitted units in chip-effect file...done 
> >> >> ##  Identifying non-estimated units...done 
> >> >> ##  Getting model fit for 934968 units. 
> >> >> ##  Identifying unit types:... 
> >> >> ##   Units: 
> >> >> ##    int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 
> >> >> ##   Unit types: 
> >> >> ##    atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ... 
> >> >> ##    - attr(*, "types")= Named int [1:8] 0 1 2 3 4 5 6 7 
> >> >> ##     ..- attr(*, "names")= chr [1:8] "unknown" "expression" 
> >> >> "genotyping" "resequencing" ... 
> >> >> ##   unitTypes 
> >> >> ##        2 
> >> >> ##   934968 
> >> >> ##   Unique unit types: 
> >> >> ##   genotyping 
> >> >> ##            2 
> >> >> ##  Identifying unit types:...done 
> >> >> ##  Setting up parameter sets... 
> >> >> ##   Creating CEL file... 
> >> >> ##    Chip type: GenomeWideSNP_6,Full 
> >> >> ##    Pathname: 
> >> >> 
> >> >> 
> >> >> 
> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL
>  
>
> >> >> ##    Returning already existing file. 
> >> >> ##   Creating CEL file...done 
> >> >> ##             used (Mb) gc trigger  (Mb) max used  (Mb) 
> >> >> ##   Ncells  549305 29.4    1606180  85.8 13015638 695.2 
> >> >> ##   Vcells 7692698 58.7   23004541 175.6 40154683 306.4 
> >> >> ##  Setting up parameter sets...done 
> >> >> ##  Fitting AvgCnPlm for each unit type separately... 
> >> >> ##   Unit types: 
> >> >> ##   genotyping 
> >> >> ##            2 
> >> >> ##   Unit type #1 ('genotyping') of 1... 
> >> >> ##    Unit type: genotyping (code=2) 
> >> >> ##    Number of units of this type: 934968 
> >> >> ##     int [1:934968] 622 623 624 625 626 627 628 629 630 631 ... 
> >> >> ##    Fitting the model by unit dimensions (at least for the large 
> >> >> classes)... 
> >> >> ##     Grouping units into equivalent (unit,group,cell) 
> dimensions... 
> >> >> [...] 
> >> >> 
> >> >> What is particular interesting are the outputs of your unit types 
> and 
> >> >> unit indices. 
> >> >> 
> >> >> /Henrik 
> >> >> 
> >> >> On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache 
> >> >> <ioana.cu...@gmail.com> wrote: 
> >> >> >> library("aroma.affymetrix") 
> >> >> >> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE) 
> >> >> >> options(digits=4) 
> >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", 
> tags="Full") 
> >> >> >> print(cdf) 
> >> >> >> gi <- getGenomeInformation(cdf) 
> >> >> >> print(gi) 
> >> >> >> si <- getSnpInformation(cdf) 
> >> >> >> print(si) 
> >> >> >> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, 
> >> >> >> fullname=FALSE)) 
> >> >> >> print(acs) 
> >> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", 
> tags="Full") 
> >> >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf) 
> >> >> >> print(csR) 
> >> >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2") 
> >> >> >> print(acc) 
> >> >> >> csC <- process(acc, verbose=verbose) 
> >> >> >> print(csC) 
> >> >> >> bpn <- BasePositionNormalization(csC, target="zero") 
> >> >> >> print(bpn) 
> >> >> >> csN <- process(bpn, verbose=verbose) 
> >> >> >> print(csN) 
> >> > 
> >> > -- 
> >> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
> >> > latest 
> >> > version of the package, 2) to report the output of sessionInfo() and 
> >> > traceback(), and 3) to post a complete code example. 
> >> > 
> >> > 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. 
> >> > To post to this group, send email to aroma-af...@googlegroups.com 
> >> > To unsubscribe and other options, go to 
> >> > http://www.aroma-project.org/forum/ 
> > 
> > -- 
> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
> latest 
> > version of the package, 2) to report the output of sessionInfo() and 
> > traceback(), and 3) to post a complete code example. 
> > 
> > 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "aroma.affymetrix" group with website http://www.aroma-project.org/. 
> > To post to this group, send email to 
> > aroma-af...@googlegroups.com<javascript:> 
> > To unsubscribe and other options, go to 
> http://www.aroma-project.org/forum/ 
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Reply via email to