Re: [aroma.affymetrix] problem using CRMA v2

Henrik Bengtsson Tue, 27 Nov 2012 11:41:18 -0800

Hi,

this certainly looks weird and it's very puzzling.  My last guess it
that your so called monocell CDF is corrupt (but I cannot see how that
happened).  Verify that you get the same:


library("aroma.affymetrix");

cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");
# ASSERT CORRECTNESS
stopifnot(identical(getChecksum(cdf), "3fbe0f6e7c8a346105238a3f3d10d4ec"))

cdfM <- getMonocellCdf(cdf);
print(cdfM);
## AffymetrixCdfFile:
## Path: annotationData/chipTypes/GenomeWideSNP_6
## Filename: GenomeWideSNP_6,Full,monocell.CDF
## File size: 416.27 MB (436492121 bytes)
## Chip type: GenomeWideSNP_6,Full,monocell
## RAM: 0.00MB
## File format: v4 (binary; XDA)
## Dimension: 1681x1681
## Number of cells: 2825761
## Number of units: 1881415
## Cells per unit: 1.50
## Number of QC units: 4

# ASSERT CORRECTNESS
stopifnot(identical(getChecksum(cdfM), "3a797e24a62f4f2e5214a6bf819151da"));


If you get an error at the latter step, then I recommend that you
remove that file by doing:

pathnameM <- getPathname(cdfM);
file.remove(pathnameM);
stopifnot(!isFile(pathnameM));

and then also clear the file cache for aroma.affymetrix by doing:

path <- getCachePath(dirs="aroma.affymetrix");
removeDirectory(path, recursive=TRUE);

Then regenerate the monocell CDF (it will take a long time):

cdfM <- getMonocellCdf(cdf, verbose=-10);

and again compare with the output I get, particularly:

stopifnot(identical(getChecksum(cdfM), "3a797e24a62f4f2e5214a6bf819151da"));

If get to this point, it means that was the problem and you should be
able to start your analysis.


I'm really sorry that you get hit by this - it's really odd and it's
extremely rare that someone has to got through all this mess to get
things working - it should just work out of the box and hopefully it
will do after this.

/Henrik

On Tue, Nov 27, 2012 at 2:39 AM, Ioana Cutcutache
<ioana.cutcuta...@gmail.com> wrote:
> I am still not able to get same as you for units (so if I continue doing
> units2 <- fit(plm, verbose=-10) I am not getting the right thing either).
> I ran everything again and here is the output of the last part:
>
>> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE)
>> units1 <- fitCnProbes(plm, verbose=verbose);
> Estimating single-probe CN units...
>  Getting chip-effect set...
>   Getting chip-effect set from data set...
>   Getting chip-effect set from data set...done
>  Getting chip-effect set...done
>  Identifying CN units...
>    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
> 935598 935599 ...
>  Identifying CN units...done
>  Identifying subset to fit...
>   Identifying non-fitted units in chip-effect file...
>    Pathname:
> plmData/test,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/91228050T,chipEffects.CEL
>    Found indices cached on file
>    Reading data for these 1881415 cells...
>    Reading data for these 1881415 cells...done
>    Looking for stdvs <= 0 indicating non-estimated units:
>     int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ...
>   Identifying non-fitted units in chip-effect file...done
>    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
> 935598 935599 ...
>  Identifying subset to fit...done
>  Identifying cell indices for CN units...
>  Identifying cell indices for CN units...done
>  Keeping only single-cell units...
>   Single-cell units:
>    int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
> 935598 935599 ...
>   Cell indices:
>    int [1:945826] 3435872 3403913 422133 2456228 4388533 5792832 3788106
> 3660159 3524469 3264141 ...
>  Keeping only single-cell units...done
>  Identifying cell indices for estimates...
>    int [1:945826] 1876054 1876055 1876056 1876057 1876058 1876059 1876060
> 1876061 1876062 1876063 ...
>  Identifying cell indices for estimates...done
>  Fitting 1 arrays...
>   Array #1 ('91228050T') of 1...
>    Reading signals...
>      num [1:945826] 6448 7260 7878 6151 12230 ...
>    Reading signals...done
>    Transforming signals to estimates...
>     'data.frame':       945826 obs. of  4 variables:
>      $ cell    : int  1876054 1876055 1876056 1876057 1876058 1876059
> 1876060 1876061 1876062 1876063 ...
>      $ theta   : num  6448 7260 7878 6151 12230 ...
>      $ sdTheta : num  1.49e-08 1.49e-08 1.49e-08 1.49e-08 1.49e-08 ...
>      $ outliers: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
>    Transforming signals to estimates...done
>    Writing estimates...
>     Storing flat data to file...
>      Updating file...
>      Updating file...done
>     Storing flat data to file...done
>    Writing estimates...done
>   Array #1 ('91228050T') of 1...done
>              used  (Mb) gc trigger  (Mb)  max used  (Mb)
>   Ncells   483738  25.9    1733216  92.6  12913475 689.7
>   Vcells 28397870 216.7   83594476 637.8 130588565 996.4
>  Fitting 1 arrays...done
> Estimating single-probe CN units...done
>> str(units1);
>  int [1:945826] 935590 935591 935592 935593 935594 935595 935596 935597
> 935598 935599 ...
>> units <- findUnitsTodo(plm);
>> str(units);
>  int [1:935589] 1 2 3 4 5 6 7 8 9 10 ...
>
> On Monday, 26 November 2012 23:08:47 UTC+2, Henrik Bengtsson wrote:
>>
>> On Mon, Nov 26, 2012 at 12:46 PM, Ioana Cutcutache
>> <ioana.cu...@gmail.com> wrote:
>> > Ok, so the first part ran without any errors, output below:
>> >
>> >> print(csN);
>> > AffymetrixCelSet:
>> > Name: test
>> > Tags: ACC,ra,-XY,BPN,-XY
>> > Path: probeData/test,ACC,ra,-XY,BPN,-XY/GenomeWideSNP_6
>> > Platform: Affymetrix
>> > Chip type: GenomeWideSNP_6,Full
>> > Number of arrays: 1
>> > Names: 91228050T [1]
>> > Time period: 2011-07-19 05:08:51 -- 2011-07-19 05:08:51
>> > Total file size: 65.83MB
>> > RAM: 0.01MB
>> >> print(getFullNames(csN));
>> > [1] "91228050T"
>> >> cdf2 <- getCdf(csN);
>> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT
>> >>
>> >
>> > However I am not getting same as you for the units:
>> >
>> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE);
>> >> units <- findUnitsTodo(plm);
>> >> str(units);
>> >  int [1:1881415] 1 2 3 4 5 6 7 8 9 10 ...
>>
>> Alright (but has nothing to do with the error you first reported).  If so,
>> do:
>>
>> units1 <- fitCnProbes(plm, verbose=verbose);
>> str(units1);
>> ##  int [1:945826] 935590 935591 935592 935593 935594 935595 935596
>> 935597 935598 935599 ...
>>
>> and you should get the same as:
>>
>> units <- findUnitsTodo(plm);
>> str(units);
>> ##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
>>
>> If so, the try with
>>
>> units2 <- fit(plm, verbose=-10);
>>
>> and compare to the output I already pasted in my previous message (below).
>>
>> /Henrik
>>
>> >
>> >
>> > On Monday, 26 November 2012 21:15:32 UTC+2, Henrik Bengtsson wrote:
>> >>
>> >> Hmm... first run this in a fresh R session:
>> >>
>> >> library("aroma.affymetrix");
>> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");
>> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf);
>> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2");
>> >> csC <- process(acc, verbose=verbose);
>> >> bpn <- BasePositionNormalization(csC, target="zero");
>> >> csN <- process(bpn, verbose=verbose);
>> >>
>> >> then what's the output of:
>> >> print(csN);
>> >> print(getFullNames(csN));
>> >> cdf2 <- getCdf(csN);
>> >> stopifnot(identical(getChecksum(cdf2), getChecksum(cdf))); # IMPORTANT
>> >>
>> >> If you get an error above, what is the output of print(cdf2) and
>> >> getChecksum(cdf2)?  If not, continue:
>> >>
>> >> plm <- AvgCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE);
>> >> units <- findUnitsTodo(plm);
>> >> str(units);
>> >> ##  int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
>> >>
>> >> Do you get the same?  If so, continue:
>> >>
>> >> units2 <- fit(plm, verbose=-10);
>> >>
>> >> You should get something like the following, except from the
>> >> dataset/sample names (if not, let me know):
>> >>
>> >> ## Fitting model of class AvgCnPlm...
>> >> ##  AvgCnPlm:
>> >> ##  Data set: GSE13372
>> >> ##  Chip type: GenomeWideSNP_6,Full
>> >> ##  Input tags: testset,ACC,ra,-XY,BPN,-XY
>> >> ##  Output tags: testset,ACC,ra,-XY,BPN,-XY,AVG,A+B
>> >> ##  Parameters: {probeModel: chr "pm", shift: num 0, flavor: chr
>> >> "median", mergeStrands: logi TRUE, combineAlleles: logi TRUE}
>> >> ##  Path:
>> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6
>> >> ##  RAM: 0.01MB
>> >> ##  Identifying non-estimated units...
>> >> ##   Identifying non-fitted units in chip-effect file...
>> >> ##    Pathname:
>> >>
>> >>
>> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/GSM337708,chipEffects.CEL
>> >> ##    Found indices cached on file
>> >> ##    Reading data for these 1881415 cells...
>> >> ##    Reading data for these 1881415 cells...done
>> >> ##    Looking for stdvs <= 0 indicating non-estimated units:
>> >> ##     int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
>> >> ##   Identifying non-fitted units in chip-effect file...done
>> >> ##  Identifying non-estimated units...done
>> >> ##  Getting model fit for 934968 units.
>> >> ##  Identifying unit types:...
>> >> ##   Units:
>> >> ##    int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
>> >> ##   Unit types:
>> >> ##    atomic [1:934968] 2 2 2 2 2 2 2 2 2 2 ...
>> >> ##    - attr(*, "types")= Named int [1:8] 0 1 2 3 4 5 6 7
>> >> ##     ..- attr(*, "names")= chr [1:8] "unknown" "expression"
>> >> "genotyping" "resequencing" ...
>> >> ##   unitTypes
>> >> ##        2
>> >> ##   934968
>> >> ##   Unique unit types:
>> >> ##   genotyping
>> >> ##            2
>> >> ##  Identifying unit types:...done
>> >> ##  Setting up parameter sets...
>> >> ##   Creating CEL file...
>> >> ##    Chip type: GenomeWideSNP_6,Full
>> >> ##    Pathname:
>> >>
>> >>
>> >> plmData/GSE13372,testset,ACC,ra,-XY,BPN,-XY,AVG,A+B/GenomeWideSNP_6/probeAffinities.CEL
>> >> ##    Returning already existing file.
>> >> ##   Creating CEL file...done
>> >> ##             used (Mb) gc trigger  (Mb) max used  (Mb)
>> >> ##   Ncells  549305 29.4    1606180  85.8 13015638 695.2
>> >> ##   Vcells 7692698 58.7   23004541 175.6 40154683 306.4
>> >> ##  Setting up parameter sets...done
>> >> ##  Fitting AvgCnPlm for each unit type separately...
>> >> ##   Unit types:
>> >> ##   genotyping
>> >> ##            2
>> >> ##   Unit type #1 ('genotyping') of 1...
>> >> ##    Unit type: genotyping (code=2)
>> >> ##    Number of units of this type: 934968
>> >> ##     int [1:934968] 622 623 624 625 626 627 628 629 630 631 ...
>> >> ##    Fitting the model by unit dimensions (at least for the large
>> >> classes)...
>> >> ##     Grouping units into equivalent (unit,group,cell) dimensions...
>> >> [...]
>> >>
>> >> What is particular interesting are the outputs of your unit types and
>> >> unit indices.
>> >>
>> >> /Henrik
>> >>
>> >> On Mon, Nov 26, 2012 at 10:45 AM, Ioana Cutcutache
>> >> <ioana.cu...@gmail.com> wrote:
>> >> >> library("aroma.affymetrix")
>> >> >> log <- verbose <- Arguments$getVerbose(-8, timestamp=TRUE)
>> >> >> options(digits=4)
>> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full")
>> >> >> print(cdf)
>> >> >> gi <- getGenomeInformation(cdf)
>> >> >> print(gi)
>> >> >> si <- getSnpInformation(cdf)
>> >> >> print(si)
>> >> >> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf,
>> >> >> fullname=FALSE))
>> >> >> print(acs)
>> >> >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full")
>> >> >> csR <- AffymetrixCelSet$byName("test", cdf=cdf)
>> >> >> print(csR)
>> >> >> acc <- AllelicCrosstalkCalibration(csR, model="CRMAv2")
>> >> >> print(acc)
>> >> >> csC <- process(acc, verbose=verbose)
>> >> >> print(csC)
>> >> >> bpn <- BasePositionNormalization(csC, target="zero")
>> >> >> print(bpn)
>> >> >> csN <- process(bpn, verbose=verbose)
>> >> >> print(csN)
>> >
>> > --
>> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>> > latest
>> > version of the package, 2) to report the output of sessionInfo() and
>> > traceback(), and 3) to post a complete code example.
>> >
>> >
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> > To post to this group, send email to aroma-af...@googlegroups.com
>> > To unsubscribe and other options, go to
>> > http://www.aroma-project.org/forum/
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] problem using CRMA v2

Reply via email to