[aroma.affymetrix] Re: Problem understanding FIRMA

cstratowa Mon, 23 Mar 2009 03:05:33 -0700

Dear Mark

I have sent you the PNG file to your email.


Best regards
Christian

On Mar 23, 9:44 am, "Mark Robinson" <[email protected]> wrote:
> Hi Christian.
>
> Thanks for this.  That's great!
>
> Can you send me a copy of your PNG file to my email?
>
> I will add this 
> to:http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests
>
> Cheers,
> Mark
>
>
>
> > Dear Mark,
>
> > I have now created the following function, which people can use to
> > compute firma scores:
>
> > # - - - - - - - - - - - - - - - - - - -
> > firma <- function(m, method="rlm") {
> > ## function to compute FIRMA scores (adapted from aroma.affymetrix)
> > ## m:      dataframe containing normalized probe intensities (linear
> > scale)
> > ##         with 1.column containing the transcript_cluster_id
> > ##         and 2. column containing the corresponding probeset_ids
> > ## method: fitting model, one of "rlm" or "mdp"
>
> >    ## convert expression levels to log2
> >    y <- as.matrix(log2(m[,3:ncol(m)]));
>
> >    ## dimensions
> >    K <- nrow(y);   # number of probes
> >    I <- ncol(y);   # number of arrays
>
> >    ## fit log-additive model
> >    if (method == "rlm") {
> >       fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345,
> > PACKAGE="preprocessCore");
> >    } else if (method == "mdp") {
> >       mp <- medpolish(y, trace.iter=FALSE);
> >       fit <- list(Estimates = c(mp$overall + mp$col,  mp$row),
> >                   StdErrors = rep(0, length(c(mp$row, mp$col))));
> >    } else {
> >       stop("method must be <rlm, mdp>");
> >    }#if
>
> >    ## extract parameters
> >    est <- fit$Estimates;
> >    se  <- fit$StdErrors;
>
> >    ## chip effects
> >    beta <- est[1:I];
>
> >    ## probe affinities
> >    if (K == 1) {
> >       ## if only one probe must have affinity=1 since sum constraint
> >       alpha <- 0;
> >    } else {
> >       ## affinities sum to zero (on log scale)
> >       alpha <- est[(I+1):length(est)];
> >       alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]);
> >    }#if
>
> >     ## estimates on the intensity scale
> >     theta <- 2^beta;
> >     phi   <- 2^alpha;
>
> >    ## calculate residuals
> >    phi   <- matrix(phi,   nrow=K, ncol=I, byrow=FALSE);
> >    theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE);
> >    yhat  <- phi *theta;
> >    eps   <- (2^y)/yhat;  # rma uses y/yhat
>
> >    ## estimate of standard error
> >    u.mad <- mad(unlist(log2(eps)), center=0);
>
> >    ## get probeset_ids
> >    id <- unique(m[,2]);
>
> >    ## firma scores
> >    fs <- sapply(id,
> > function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2,
> > median)});
> >    fs <- t(fs);
> >    rownames(fs) <- id;
>
> >    return(fs);
> > }#firma
> > # - - - - - - - - - - - - - - - - - - -
>
> > Now I can apply this function to the normalized intensities of gene
> > UNR:
>
> >  > library(preprocessCore)
> >  > score.mdp <- firma(unr, "mdp")
> >  > score.rml <- firma(unr, "rlm")
>
> > Then I can plot the firma scores for robust fitting ("rlm") compared
> > to
> > median-polish ("mdp"):
>
> >  > png(file="firmaplots6.png", width=540, height=800)
> >  > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2))
> >  > for (i in 1:6) {
> >  >    plot(score.rml[,i], type="l", ylim=c(-3,5),
> > main=colnames(score.rml)[i], xlab="Probeset", ylab="Score")
> >  >    abline(h=0, lty=3)
> >  >    lines(score.mdp[,i], lty=2)
> >  > }
> >  > par(oldpar)
> >  > dev.off()
>
> > The attached plots compare "rlm" (solid lines) and "mdp" (dashed
> > lines)
> > and should reflect the firma scores for heart and muscle shown in
> > Figure
> > 3B of Purdom et al.
> > As you can see the difference between"rlm" and "mdp" is pretty small.
>
> > Best regards
> > Christian
>
> > P.S.: It seems that I cannot attach the png-file.
>
> > On Mar 18, 8:28 am, cstratowa <[email protected]
> > ingelheim.com> wrote:
> >> Dear Mark,
>
> >> Probably, there is not much difference, maybe I will check it.
> >> I will let you know but it will take some time.
>
> >> Best regards
> >> Christian
>
> >> On Mar 17, 11:46 am, Mark Robinson <[email protected]> wrote:
>
> >> > Hi Christian.
>
> >> > > However, in this respect I have also the following question:
> >> > > How does using "median polish" compare to using
> >> > > "R_rlm_rma_default_model"?
> >> > > Are the final scores still of some use if you use medpol?
>
> >> > Short answer is I haven't investigated this too thoroughly.
>
> >> > But, my guess is that it wouldn't be "too" different.  That prediction
> >>  
> >> > is based on the fact that the chip effects are in the same ballpark,  
> >> > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)  
> >> > plot in the following thread:
>
> >> >http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
>
> >> > But, I'd be interested to hear more details if you do look into it
> >> more.
>
> >> > Cheers,
> >> > Mark
>
> >> > > Best regards
> >> > > Christian
>
> >> > > On Mar 16, 9:13 am, Mark Robinson <[email protected]> wrote:
> >> > >> Hi Christian.
>
> >> > >>  From what I can tell looking at your code (rather quickly, i must
> >> > >> admit), there will be 2 differences between aroma.affymetrix and
> >> what
> >> > >> you have:
>
> >> > >> 1. We use the 'preprocessCore' codebase for the robust fitting of
> >> the
> >> > >> linear model (... but maybe you are just using median polish as an
> >> > >> illustration).  For example, you might try:
>
> >> > >> library(preprocessCore)
> >> > >> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0,
> >> > >> 1.345,PACKAGE="preprocessCore")
> >> > >> [... and piece together the alpha, beta, etc ...]
>
> >> > >> 2. The "estimate of standard error" is calculated genewise, over
> >> > >> residuals from all probes/samples (i.e. u.mad should be a scalar  
> >> > >> not a
> >> > >> vector).
>
> >> > >> Hope that helps.
> >> > >> Mark
>
> >> > >> On 16/03/2009, at 6:32 PM, cstratowa wrote:
>
> >> > >>> Dear all,
>
> >> > >>> After reading the FIRMA paper I would like to understand the
> >> > >>> implementation, but this is not easy since the source code is hard
> >>  
> >> > >>> to
> >> > >>> read. Nevertheless, I tried and would like to know if this is  
> >> > >>> correct.
>
> >> > >>> According to the page on exon array analysis you do the following:
>
> >> > >>> I, fit a summary of the entire transcript
> >> > >>>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> >> > >>>> fit(plmTr, verbose=verbose)
>
> >> > >>> II, fit the FIRMA model for each exon
> >> > >>>> firma <- FirmaModel(plmTr)
> >> > >>>> fit(firma, verbose=verbose)
>
> >> > >>> However, I would like to understand the underlying source code.
>
> >> > >>> For this example let us assume that we have quantile-normalized
> >> > >>> intensities yTr for a transcript  containing two exons:
> >> > >>>> yTr
> >> > >>>     HeartA   HeartB    HeartC  MuscleA  MuscleB  MuscleC
> >> > >>> 1   5.74954   18.0296    2.50436   15.5857   26.1744   31.0075
> >> > >>> 2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
> >> > >>> 3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
> >> > >>> 4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
> >> > >>> 5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
> >> > >>> 6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
> >> > >>> 7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
> >> > >>> 8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740
>
> >> > >>> Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.
>
> >> > >>> I, fit a summary of the entire transcript
> >> > >>> To simplify issues I will fit the data using median polish:
> >> > >>> # 1. fit median polish
> >> > >>>> mp <- medpolish(log2(yTr))
>
> >> > >>> # 2. data set specific estimates (probe affinities)
> >> > >>>> beta  <- mp$overall+mp$col
> >> > >>>> thetaTr <- 2^beta
>
> >> > >>> # 3. array-specific estimates
> >> > >>>> alpha <- mp$row
> >> > >>>> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
> >> > >>>> phiTr <- 2^alpha
>
> >> > >>> II, fit FIRMA model for each exon
> >> > >>> # 1. calculate residuals
> >> > >>>> phi   <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
> >> > >>>> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
> >> > >>> byrow=TRUE)
> >> > >>>> yhat  <- phi *theta
> >> > >>>> eps   <- yTr/yhat    # rma uses y/yhat
>
> >> > >>> # 2. estimate of standard error
> >> > >>>> u.mad <- apply(log2(eps), 2, mad, center=0)
>
> >> > >>> # 3. compute final score statisitc
> >> > >>> # for 1. exon
> >> > >>>> y1 <- log2(eps[1:4,])
> >> > >>>> F1 <- apply(y1/u.mad, 2, median)
> >> > >>>> F1
> >> > >>>     HeartA      HeartB      HeartC     MuscleA     MuscleB
> >> > >>> MuscleC
> >> > >>> -0.89938777 -0.03792624 -0.69409936  0.11536565 -0.61385296
> >> > >>> 1.08709568
>
> >> > >>> # for 2. exon
> >> > >>>> y2 <- log2(eps[5:8,])
> >> > >>>> F2 <- apply(y2/u.mad, 2, median)
> >> > >>>> F2
> >> > >>>     HeartA      HeartB      HeartC     MuscleA     MuscleB
> >> > >>> MuscleC
> >> > >>> -0.02899616 -1.64645153 -0.70048533 -0.39996057  0.02666064
> >> > >>> -1.46657055
>
> >> > >>> Now my question is:
> >> > >>> Is this calculation of the final score statistic F1 for exon 1 and
> >>  
> >> > >>> F2
> >> > >>> for exon 2 correct?
> >> > >>> Did I miss something?
>
> >> > >>> Best regards
> >> > >>> Christian
>
> >> > >> ------------------------------
> >> > >> Mark Robinson
> >> > >> Epigenetics Laboratory, Garvan
> >> > >> Bioinformatics Division, WEHI
> >> > >> e: [email protected]
> >> > >> e: [email protected]
> >> > >> p: +61 (0)3 9345 2628
> >> > >> f: +61 (0)3 9347 0852
> >> > >> ------------------------------
>
> >> > ------------------------------
> >> > Mark Robinson
> >> > Epigenetics Laboratory, Garvan
> >> > Bioinformatics Division, WEHI
> >> > e: [email protected]
> >> > e: [email protected]
> >> > p: +61 (0)3 9345 2628
> >> > f: +61 (0)3 9347 0852
> >> > ------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: Problem understanding FIRMA

Reply via email to