[aroma.affymetrix] Re: Problem understanding FIRMA

cstratowa Mon, 23 Mar 2009 00:41:32 -0700

Dear Mark,

I have now created the following function, which people can use to
compute firma scores:


# - - - - - - - - - - - - - - - - - - -
firma <- function(m, method="rlm") {
## function to compute FIRMA scores (adapted from aroma.affymetrix)
## m:      dataframe containing normalized probe intensities (linear
scale)
##         with 1.column containing the transcript_cluster_id
##         and 2. column containing the corresponding probeset_ids
## method: fitting model, one of "rlm" or "mdp"

   ## convert expression levels to log2
   y <- as.matrix(log2(m[,3:ncol(m)]));

   ## dimensions
   K <- nrow(y);   # number of probes
   I <- ncol(y);   # number of arrays

   ## fit log-additive model
   if (method == "rlm") {
      fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345,
PACKAGE="preprocessCore");
   } else if (method == "mdp") {
      mp <- medpolish(y, trace.iter=FALSE);
      fit <- list(Estimates = c(mp$overall + mp$col,  mp$row),
                  StdErrors = rep(0, length(c(mp$row, mp$col))));
   } else {
      stop("method must be <rlm, mdp>");
   }#if

   ## extract parameters
   est <- fit$Estimates;
   se  <- fit$StdErrors;

   ## chip effects
   beta <- est[1:I];

   ## probe affinities
   if (K == 1) {
      ## if only one probe must have affinity=1 since sum constraint
      alpha <- 0;
   } else {
      ## affinities sum to zero (on log scale)
      alpha <- est[(I+1):length(est)];
      alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]);
   }#if

    ## estimates on the intensity scale
    theta <- 2^beta;
    phi   <- 2^alpha;

   ## calculate residuals
   phi   <- matrix(phi,   nrow=K, ncol=I, byrow=FALSE);
   theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE);
   yhat  <- phi *theta;
   eps   <- (2^y)/yhat;  # rma uses y/yhat

   ## estimate of standard error
   u.mad <- mad(unlist(log2(eps)), center=0);

   ## get probeset_ids
   id <- unique(m[,2]);

   ## firma scores
   fs <- sapply(id,
function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2,
median)});
   fs <- t(fs);
   rownames(fs) <- id;

   return(fs);
}#firma
# - - - - - - - - - - - - - - - - - - -

Now I can apply this function to the normalized intensities of gene
UNR:

 > library(preprocessCore)
 > score.mdp <- firma(unr, "mdp")
 > score.rml <- firma(unr, "rlm")

Then I can plot the firma scores for robust fitting ("rlm") compared
to
median-polish ("mdp"):

 > png(file="firmaplots6.png", width=540, height=800)
 > oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2))
 > for (i in 1:6) {
 >    plot(score.rml[,i], type="l", ylim=c(-3,5),
main=colnames(score.rml)[i], xlab="Probeset", ylab="Score")
 >    abline(h=0, lty=3)
 >    lines(score.mdp[,i], lty=2)
 > }
 > par(oldpar)
 > dev.off()

The attached plots compare "rlm" (solid lines) and "mdp" (dashed
lines)
and should reflect the firma scores for heart and muscle shown in
Figure
3B of Purdom et al.
As you can see the difference between"rlm" and "mdp" is pretty small.

Best regards
Christian

P.S.: It seems that I cannot attach the png-file.


On Mar 18, 8:28 am, cstratowa <[email protected]
ingelheim.com> wrote:
> Dear Mark,
>
> Probably, there is not much difference, maybe I will check it.
> I will let you know but it will take some time.
>
> Best regards
> Christian
>
> On Mar 17, 11:46 am, Mark Robinson <[email protected]> wrote:
>
> > Hi Christian.
>
> > > However, in this respect I have also the following question:
> > > How does using "median polish" compare to using
> > > "R_rlm_rma_default_model"?
> > > Are the final scores still of some use if you use medpol?
>
> > Short answer is I haven't investigated this too thoroughly.
>
> > But, my guess is that it wouldn't be "too" different.  That prediction  
> > is based on the fact that the chip effects are in the same ballpark,  
> > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)  
> > plot in the following thread:
>
> >http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
>
> > But, I'd be interested to hear more details if you do look into it more.
>
> > Cheers,
> > Mark
>
> > > Best regards
> > > Christian
>
> > > On Mar 16, 9:13 am, Mark Robinson <[email protected]> wrote:
> > >> Hi Christian.
>
> > >>  From what I can tell looking at your code (rather quickly, i must
> > >> admit), there will be 2 differences between aroma.affymetrix and what
> > >> you have:
>
> > >> 1. We use the 'preprocessCore' codebase for the robust fitting of the
> > >> linear model (... but maybe you are just using median polish as an
> > >> illustration).  For example, you might try:
>
> > >> library(preprocessCore)
> > >> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0,
> > >> 1.345,PACKAGE="preprocessCore")
> > >> [... and piece together the alpha, beta, etc ...]
>
> > >> 2. The "estimate of standard error" is calculated genewise, over
> > >> residuals from all probes/samples (i.e. u.mad should be a scalar  
> > >> not a
> > >> vector).
>
> > >> Hope that helps.
> > >> Mark
>
> > >> On 16/03/2009, at 6:32 PM, cstratowa wrote:
>
> > >>> Dear all,
>
> > >>> After reading the FIRMA paper I would like to understand the
> > >>> implementation, but this is not easy since the source code is hard  
> > >>> to
> > >>> read. Nevertheless, I tried and would like to know if this is  
> > >>> correct.
>
> > >>> According to the page on exon array analysis you do the following:
>
> > >>> I, fit a summary of the entire transcript
> > >>>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> > >>>> fit(plmTr, verbose=verbose)
>
> > >>> II, fit the FIRMA model for each exon
> > >>>> firma <- FirmaModel(plmTr)
> > >>>> fit(firma, verbose=verbose)
>
> > >>> However, I would like to understand the underlying source code.
>
> > >>> For this example let us assume that we have quantile-normalized
> > >>> intensities yTr for a transcript  containing two exons:
> > >>>> yTr
> > >>>     HeartA   HeartB    HeartC  MuscleA  MuscleB  MuscleC
> > >>> 1   5.74954   18.0296    2.50436   15.5857   26.1744   31.0075
> > >>> 2   9.59819   23.0093   22.01120   70.1742   32.8408  102.0080
> > >>> 3 114.50800   87.1742   70.34080  312.3410  266.1740  601.3410
> > >>> 4  66.34080   52.0075   67.34080  184.1740  266.1740  147.0080
> > >>> 5 210.17400  142.0080  173.34100  514.5080  659.1740  509.6740
> > >>> 6 104.00800   84.3408   70.34080  333.5080  324.1740  231.0080
> > >>> 7 194.00800  124.5080  234.00800  443.6740  767.5080  716.8410
> > >>> 8 319.34100  282.6740  283.50800  656.0080  807.6740  954.6740
>
> > >>> Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.
>
> > >>> I, fit a summary of the entire transcript
> > >>> To simplify issues I will fit the data using median polish:
> > >>> # 1. fit median polish
> > >>>> mp <- medpolish(log2(yTr))
>
> > >>> # 2. data set specific estimates (probe affinities)
> > >>>> beta  <- mp$overall+mp$col
> > >>>> thetaTr <- 2^beta
>
> > >>> # 3. array-specific estimates
> > >>>> alpha <- mp$row
> > >>>> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
> > >>>> phiTr <- 2^alpha
>
> > >>> II, fit FIRMA model for each exon
> > >>> # 1. calculate residuals
> > >>>> phi   <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
> > >>>> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
> > >>> byrow=TRUE)
> > >>>> yhat  <- phi *theta
> > >>>> eps   <- yTr/yhat    # rma uses y/yhat
>
> > >>> # 2. estimate of standard error
> > >>>> u.mad <- apply(log2(eps), 2, mad, center=0)
>
> > >>> # 3. compute final score statisitc
> > >>> # for 1. exon
> > >>>> y1 <- log2(eps[1:4,])
> > >>>> F1 <- apply(y1/u.mad, 2, median)
> > >>>> F1
> > >>>     HeartA      HeartB      HeartC     MuscleA     MuscleB
> > >>> MuscleC
> > >>> -0.89938777 -0.03792624 -0.69409936  0.11536565 -0.61385296
> > >>> 1.08709568
>
> > >>> # for 2. exon
> > >>>> y2 <- log2(eps[5:8,])
> > >>>> F2 <- apply(y2/u.mad, 2, median)
> > >>>> F2
> > >>>     HeartA      HeartB      HeartC     MuscleA     MuscleB
> > >>> MuscleC
> > >>> -0.02899616 -1.64645153 -0.70048533 -0.39996057  0.02666064
> > >>> -1.46657055
>
> > >>> Now my question is:
> > >>> Is this calculation of the final score statistic F1 for exon 1 and  
> > >>> F2
> > >>> for exon 2 correct?
> > >>> Did I miss something?
>
> > >>> Best regards
> > >>> Christian
>
> > >> ------------------------------
> > >> Mark Robinson
> > >> Epigenetics Laboratory, Garvan
> > >> Bioinformatics Division, WEHI
> > >> e: [email protected]
> > >> e: [email protected]
> > >> p: +61 (0)3 9345 2628
> > >> f: +61 (0)3 9347 0852
> > >> ------------------------------
>
> > ------------------------------
> > Mark Robinson
> > Epigenetics Laboratory, Garvan
> > Bioinformatics Division, WEHI
> > e: [email protected]
> > e: [email protected]
> > p: +61 (0)3 9345 2628
> > f: +61 (0)3 9347 0852
> > ------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: Problem understanding FIRMA

Reply via email to