Re: [aroma.affymetrix] Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis

Henrik Bengtsson Thu, 05 Aug 2010 07:13:10 -0700

[I'm sorry, but I wrote a reply on June 24 that was never sent off - I
just found it sitting in my draft folder. Anyway, here it is.]


Hi,

On Wed, Jun 23, 2010 at 2:24 PM, Johan Staaf <johan.st...@med.lu.se> wrote:
> Hi Henrik
> It is correct that your CRMAv2 "tutorial" was used as base, sorry for that.
> The script below generated nice profiles for other Sty data (e.g. GSE7545,
> GSE19399) as well as Nsp chips (>1500 chips). I have only had problem with
> certain GSE data sets, e.g.  GSE14994, GSE12896. Samples from GSE14994 look
> the weirdest, however for other data sets with 500K data the Nsp data
> frequently look "better" than corresponding Sty data (?). Furthermore, there
> appears to be Sty2 and Sty1 chips in GEO (sloppy annotations?), however they
> all point to the same GPL3720, so I was wondering if this could be the
> reason. However when I used Sty2 chips from GSE5173 as normal reference the
> strange pattern was still there.

I also noticed that there are "Sty 1" and "Sty 2" data sets - to the
best of my knowledge there has only been one Sty chip and that is
formally called Mapping250K_Sty.   I've dropped a message on
Affymetrix Scientific Forum
[https://www.affymetrix.com/community/forums/index.jspa - Forum:
General] asking them to verify this.  [That was on June 24.  Now, on
August 5, there is still now reply to my post there.]

Your script looks alright to me.

I downloaded

ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM374nnn/GSM374371/GSM374371.cel.gz
ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM182nnn/GSM182884/GSM182884.CEL.gz

where the first is from GSE14994 and the second from GSE7545.

I ran through CRMAv2 (with AvgCnPlm) on just these two, and calculated
the CN ratios as GSM374371/GSM182884.  Although more noisy, I also see
two "CN bands".   When plotting the summaries (not the ratios) from
GSM374371 along the genome, you can see two bands as well.  That's not
the case for GSM182884.  This suggest that the GSM374371 array cause
those two bands.

I still don't know what causes this.  Do you see this for all samples
in the problematic data set GSE14994, or just a few?  I recommend that
you ask the authors/producers of this particular data set, because it
sounds like a data specific issue.

/Henrik

>
> I used R 2.11.1 and aroma.affymetrix v1.6.0 (Mac)
>
> Here is the code:
>       cesNList <- list()
>             print(paste("doing chip type:",chipType))
>       cs <- AffymetrixCelSet$byName(data.set.name, chipType=chipType);
>       norm.set<-"NormalSamples"
>       #if(chipType=="Mapping250K_Sty") norm.set<-"NormalSamplesSTY2"
>       cs.normal <- AffymetrixCelSet$byName(norm.set, chipType=chipType);
> #norm.set contains CEL files for normal samples
>       # Append data sets
>       append(cs,cs.normal);
>         cs <- extract(cs, !isDuplicated(cs))
>       print(cs)
>       print("..Done reading CEL files")
>
>       ##### Calibration for allelic crosstalk  #######
>       print("..Starting calibration for allelic crosstalk")
>       acc <- AllelicCrosstalkCalibration(cs,model="CRMAv2")
>       print(acc)
>       csC <- process(acc, verbose=verbose)
>       print(csC)
>       print("..Finished calibration for allelic crosstalk")
>       #####
>
>       ##### Normalization for nucleotide-position probe sequence effects
> #####
>       print("..Starting Normalization for nucleotide-position probe sequence
> effects")
>       bpn <- BasePositionNormalization(csC, target="zero")
>       print(bpn)
>       csN <- process(bpn, verbose=verbose)
>       print(csN)
>       print("..Finished Normalization for nucleotide-position probe sequence
> effects")
>       #####
>           ### Probe Summarization #####
>       #RmaSnpPlm
>       print("..Starting Probe Summarization")
>       plm <- RmaCnPlm(csN, mergeStrands=TRUE, combineAlleles=TRUE) # for
> 10k-500K arrays
>       #plm <- AvgCnPlm(csC, mergeStrands=TRUE, combineAlleles=TRUE,
> shift=+300)
>       #plm<-RmaCnPlm(csC,combineAlleles=TRUE,mergeStrands=TRUE) #requires >2
> arrays
>       print(plm)
>       fit(plm,verbose=verbose)
>       print("..Finished Probe Summarization")
>       #####
>
>
>       ### PCR fragment normalization ####
>       print("..Starting PCR fragment normalization")
>       ces <- getChipEffectSet(plm)
>       fln <- FragmentLengthNormalization(ces,target="zero")
>       print(fln)
>       cesNList[[chipType]] <- process(fln, verbose=verbose)
>       print("..Finished PCR fragment normalization")
>       ####
>
>
>       ####### Extract raw copy numbers ########
>             ### Identify and extract normal samples on correct chipType ##
>       if(chipType=="Mapping250K_Nsp"){
>           normal.names<-cesNormal_Nsp.names
>       }else{
>           normal.names<-cesNormal_Sty.names
>           #normal.names<-cesNormal_Sty2.names
>       }
>       cesNormal <- extract(cesNList[[chipType]], normal.names)
>       ceR <- getAverageFile(cesNormal);
>       ###
>             ### Extract information from the cdf for the actual chipType ###
>       cdf <- getCdf(cesNormal);
>       gi <- getGenomeInformation(cdf);
>       ###
>             ### Extract the actual sample names for the data set, excluding
> normals ###
>       nbr.assays<-length(cesNList[[chipType]])
>       assay.vector<-c()
>       for(i in 1:nbr.assays){
>           assay.name<-cesNList[[chipType]]$files[[i]]$Name
>           if(length(which(normal.names==assay.name))==0){
>               # only extract non-normal samples
>               assay.vector<-c(assay.vector,assay.name)
>           }
>       }
>
> #write.table(assay.vector,paste("CNdata/",data.set.name,"/Assay_names_",chipType,".txt",sep=""),col.names=c("Assay"),row.names=FALSE,sep="\t",quote=FALSE)
>       ####
>             ### Extract CN for the samples only ###
>       cesSamples <- extract(cesNList[[chipType]], assay.vector)
>       fData<-data.frame()
>       check.chr.nbr<-matrix(nrow=23,ncol=1,0)
>       for(j in 1:23){
>           print(paste("extracting chromosome",j))
>           units<-getUnitsOnChromosome(gi,j)
>           pos <- getPositions(gi, units=units);
>           uu<-order(pos)
>           unitNames <- getUnitNames(cdf, units=units);
>                             theta <- extractMatrix(cesSamples,
> units=units,addNames=TRUE);
>           rownames(theta) <- unitNames;
>           thetaR<-as.numeric(extractMatrix(ceR,units=units))
>           M.chr <- log2(theta/thetaR);
>           M.chr<-M.chr[uu,]
>           check.chr.nbr[j]<-dim(M.chr)[1]
>                     # Save each chromosome at a time #
>
> save(M.chr,file=paste("CNdata/",data.set.name,"/chr_",j,"_",chipType,"_CopyNumber.RData",sep=""))
>                     # Assemble FeatureData
>           chr.fData<-data.frame(unitNames[uu],rep(j,length(uu)),pos[uu])
>           fData<-rbind(fData,chr.fData)
>                     rm(M.chr,uu,theta,thetaR,unitNames,units,pos,chr.fData)
>           gc();
>       }
>             ## Save feature data for this chipType
>       colnames(fData)<-c("reporterId","chromosome","centerPosition")
>
> save(fData,file=paste("CNdata/",data.set.name,"/",chipType,"_FeatureData.RData",sep=""))
>       rm(fData)
>       gc();
>       #################
>
> Hope this is readable/understandable. Thanks!
>
> Best regards
> Johan S
>
> Henrik Bengtsson wrote:
>>
>> Opps, I missed that it was the Mapping250K_Snp chip type.  That one
>> only got SNPs and no pure CN loci, so there has to be another
>> explanation than the one I provided.  So, again a complete script will
>> help troubleshoot.
>>
>> /Henrik
>>
>> On Wed, Jun 23, 2010 at 2:01 PM, Henrik Bengtsson <h...@stat.berkeley.edu>
>> wrote:
>>
>>>
>>> Hi Johan,
>>>
>>> this certainly looks like a computational hiccup.  I never seen it,
>>> though I can imagine various ways how it could happen.  Instead of
>>> guessing, do you have a complete script that you did, you did you type
>>> the commands on the command line one by one?  You are saying you did
>>> the analysis "according to the tutorial for 10-500K analysis"
>>> (CRMAv1), but the steps you describe are from the CRMAv2 method.
>>>
>>> An most importantly, because I think the answer is in the here, how
>>> did you extract the summarized data and how did you generate the plot?
>>>  How did you calculate your reference signals?  One of my guesses is
>>> that the upper CN band, which looks to be correct, are for the
>>> non-polymorphic CN loci, whereas the lower one, which is shifted, is
>>> from SNP signals.  It could be that you are somehow only plotting
>>> allele-specific CA and/or CB signals and not C=CA+CB for SNPs.
>>>
>>> So, if you send all the commands verbatim, I can let you know what
>>> needs to be changed.
>>>
>>> /Henrik
>>>
>>>
>>> On Wed, Jun 23, 2010 at 9:02 AM, Johan Staaf <johan.st...@med.lu.se>
>>> wrote:
>>>
>>>>
>>>> Hi Henrik
>>>> I have a question about strange looking genomic profiles for Affymetrix
>>>> 250K
>>>> Sty2 chips from GSE14994 after CRMAv2 analysis (please see attached png
>>>> figures). Processing was done using calibration for allelic crosstalk,
>>>> normalization for nucleotide-position probe sequence effects, probe
>>>> summarization, and pcr fragment normalization according to the tutorial
>>>> for
>>>> 10-500K analysis. CN was obtained by comparison to normal samples also
>>>> obtained from public repositories and processed simultaneously.
>>>>
>>>> Do you know of the cause of this, and how it could be corrected?
>>>>
>>>> Best regards
>>>> Johan
>>>>
>>>> --
>>>> When reporting problems on aroma.affymetrix, make sure 1) to run the
>>>> latest
>>>> version of the package, 2) to report the output of sessionInfo() and
>>>> traceback(), and 3) to post a complete code example.
>>>>
>>>>
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>>>> To unsubscribe and other options, go to
>>>> http://www.aroma-project.org/forum/
>>>>
>>>>
>>
>>
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] Problems with Affymetrix 250K Sty2 arrays after CRMAv2 analysis

Reply via email to