Dear Mark,
I have now created the following function, which people can use to
compute firma scores:
# - - - - - - - - - - - - - - - - - - -
firma <- function(m, method="rlm") {
## function to compute FIRMA scores (adapted from aroma.affymetrix)
## m: dataframe containing normalized probe intensities (linear
scale)
## with 1.column containing the transcript_cluster_id
## and 2. column containing the corresponding probeset_ids
## method: fitting model, one of "rlm" or "mdp"
## convert expression levels to log2
y <- as.matrix(log2(m[,3:ncol(m)]));
## dimensions
K <- nrow(y); # number of probes
I <- ncol(y); # number of arrays
## fit log-additive model
if (method == "rlm") {
fit <- .Call("R_rlm_rma_default_model", y, 0, 1.345,
PACKAGE="preprocessCore");
} else if (method == "mdp") {
mp <- medpolish(y, trace.iter=FALSE);
fit <- list(Estimates = c(mp$overall + mp$col, mp$row),
StdErrors = rep(0, length(c(mp$row, mp$col))));
} else {
stop("method must be <rlm, mdp>");
}#if
## extract parameters
est <- fit$Estimates;
se <- fit$StdErrors;
## chip effects
beta <- est[1:I];
## probe affinities
if (K == 1) {
## if only one probe must have affinity=1 since sum constraint
alpha <- 0;
} else {
## affinities sum to zero (on log scale)
alpha <- est[(I+1):length(est)];
alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)]);
}#if
## estimates on the intensity scale
theta <- 2^beta;
phi <- 2^alpha;
## calculate residuals
phi <- matrix(phi, nrow=K, ncol=I, byrow=FALSE);
theta <- matrix(theta, nrow=K, ncol=I, byrow=TRUE);
yhat <- phi *theta;
eps <- (2^y)/yhat; # rma uses y/yhat
## estimate of standard error
u.mad <- mad(unlist(log2(eps)), center=0);
## get probeset_ids
id <- unique(m[,2]);
## firma scores
fs <- sapply(id,
function(x){apply(log2(eps[which(m[,2]==x),,drop=F])/u.mad, 2,
median)});
fs <- t(fs);
rownames(fs) <- id;
return(fs);
}#firma
# - - - - - - - - - - - - - - - - - - -
Now I can apply this function to the normalized intensities of gene
UNR:
> library(preprocessCore)
> score.mdp <- firma(unr, "mdp")
> score.rml <- firma(unr, "rlm")
Then I can plot the firma scores for robust fitting ("rlm") compared
to
median-polish ("mdp"):
> png(file="firmaplots6.png", width=540, height=800)
> oldpar <- par(pty="m", mfcol=c(3, 2), mar=c(5,5,4,2))
> for (i in 1:6) {
> plot(score.rml[,i], type="l", ylim=c(-3,5),
main=colnames(score.rml)[i], xlab="Probeset", ylab="Score")
> abline(h=0, lty=3)
> lines(score.mdp[,i], lty=2)
> }
> par(oldpar)
> dev.off()
The attached plots compare "rlm" (solid lines) and "mdp" (dashed
lines)
and should reflect the firma scores for heart and muscle shown in
Figure
3B of Purdom et al.
As you can see the difference between"rlm" and "mdp" is pretty small.
Best regards
Christian
P.S.: It seems that I cannot attach the png-file.
On Mar 18, 8:28 am, cstratowa <[email protected]
ingelheim.com> wrote:
> Dear Mark,
>
> Probably, there is not much difference, maybe I will check it.
> I will let you know but it will take some time.
>
> Best regards
> Christian
>
> On Mar 17, 11:46 am, Mark Robinson <[email protected]> wrote:
>
> > Hi Christian.
>
> > > However, in this respect I have also the following question:
> > > How does using "median polish" compare to using
> > > "R_rlm_rma_default_model"?
> > > Are the final scores still of some use if you use medpol?
>
> > Short answer is I haven't investigated this too thoroughly.
>
> > But, my guess is that it wouldn't be "too" different. That prediction
> > is based on the fact that the chip effects are in the same ballpark,
> > as you can see from the Aroma_vs_Affy (Aroma=R_rlm_rma, Affy=medpol)
> > plot in the following thread:
>
> >http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/...
>
> > But, I'd be interested to hear more details if you do look into it more.
>
> > Cheers,
> > Mark
>
> > > Best regards
> > > Christian
>
> > > On Mar 16, 9:13 am, Mark Robinson <[email protected]> wrote:
> > >> Hi Christian.
>
> > >> From what I can tell looking at your code (rather quickly, i must
> > >> admit), there will be 2 differences between aroma.affymetrix and what
> > >> you have:
>
> > >> 1. We use the 'preprocessCore' codebase for the robust fitting of the
> > >> linear model (... but maybe you are just using median polish as an
> > >> illustration). For example, you might try:
>
> > >> library(preprocessCore)
> > >> f <- .Call("R_rlm_rma_default_model", log2(yTr), 0,
> > >> 1.345,PACKAGE="preprocessCore")
> > >> [... and piece together the alpha, beta, etc ...]
>
> > >> 2. The "estimate of standard error" is calculated genewise, over
> > >> residuals from all probes/samples (i.e. u.mad should be a scalar
> > >> not a
> > >> vector).
>
> > >> Hope that helps.
> > >> Mark
>
> > >> On 16/03/2009, at 6:32 PM, cstratowa wrote:
>
> > >>> Dear all,
>
> > >>> After reading the FIRMA paper I would like to understand the
> > >>> implementation, but this is not easy since the source code is hard
> > >>> to
> > >>> read. Nevertheless, I tried and would like to know if this is
> > >>> correct.
>
> > >>> According to the page on exon array analysis you do the following:
>
> > >>> I, fit a summary of the entire transcript
> > >>>> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> > >>>> fit(plmTr, verbose=verbose)
>
> > >>> II, fit the FIRMA model for each exon
> > >>>> firma <- FirmaModel(plmTr)
> > >>>> fit(firma, verbose=verbose)
>
> > >>> However, I would like to understand the underlying source code.
>
> > >>> For this example let us assume that we have quantile-normalized
> > >>> intensities yTr for a transcript containing two exons:
> > >>>> yTr
> > >>> HeartA HeartB HeartC MuscleA MuscleB MuscleC
> > >>> 1 5.74954 18.0296 2.50436 15.5857 26.1744 31.0075
> > >>> 2 9.59819 23.0093 22.01120 70.1742 32.8408 102.0080
> > >>> 3 114.50800 87.1742 70.34080 312.3410 266.1740 601.3410
> > >>> 4 66.34080 52.0075 67.34080 184.1740 266.1740 147.0080
> > >>> 5 210.17400 142.0080 173.34100 514.5080 659.1740 509.6740
> > >>> 6 104.00800 84.3408 70.34080 333.5080 324.1740 231.0080
> > >>> 7 194.00800 124.5080 234.00800 443.6740 767.5080 716.8410
> > >>> 8 319.34100 282.6740 283.50800 656.0080 807.6740 954.6740
>
> > >>> Here rows 1:4 code for exon 1 and rows 5:8 code for exon 2.
>
> > >>> I, fit a summary of the entire transcript
> > >>> To simplify issues I will fit the data using median polish:
> > >>> # 1. fit median polish
> > >>>> mp <- medpolish(log2(yTr))
>
> > >>> # 2. data set specific estimates (probe affinities)
> > >>>> beta <- mp$overall+mp$col
> > >>>> thetaTr <- 2^beta
>
> > >>> # 3. array-specific estimates
> > >>>> alpha <- mp$row
> > >>>> alpha[length(alpha)] <- -sum(alpha[1:(length(alpha)-1)])
> > >>>> phiTr <- 2^alpha
>
> > >>> II, fit FIRMA model for each exon
> > >>> # 1. calculate residuals
> > >>>> phi <- matrix(phiTr, nrow=nrow(yTr), ncol=ncol(yTr))
> > >>>> theta <- matrix(thetaTr, nrow=nrow(yTr), ncol=ncol(yTr),
> > >>> byrow=TRUE)
> > >>>> yhat <- phi *theta
> > >>>> eps <- yTr/yhat # rma uses y/yhat
>
> > >>> # 2. estimate of standard error
> > >>>> u.mad <- apply(log2(eps), 2, mad, center=0)
>
> > >>> # 3. compute final score statisitc
> > >>> # for 1. exon
> > >>>> y1 <- log2(eps[1:4,])
> > >>>> F1 <- apply(y1/u.mad, 2, median)
> > >>>> F1
> > >>> HeartA HeartB HeartC MuscleA MuscleB
> > >>> MuscleC
> > >>> -0.89938777 -0.03792624 -0.69409936 0.11536565 -0.61385296
> > >>> 1.08709568
>
> > >>> # for 2. exon
> > >>>> y2 <- log2(eps[5:8,])
> > >>>> F2 <- apply(y2/u.mad, 2, median)
> > >>>> F2
> > >>> HeartA HeartB HeartC MuscleA MuscleB
> > >>> MuscleC
> > >>> -0.02899616 -1.64645153 -0.70048533 -0.39996057 0.02666064
> > >>> -1.46657055
>
> > >>> Now my question is:
> > >>> Is this calculation of the final score statistic F1 for exon 1 and
> > >>> F2
> > >>> for exon 2 correct?
> > >>> Did I miss something?
>
> > >>> Best regards
> > >>> Christian
>
> > >> ------------------------------
> > >> Mark Robinson
> > >> Epigenetics Laboratory, Garvan
> > >> Bioinformatics Division, WEHI
> > >> e: [email protected]
> > >> e: [email protected]
> > >> p: +61 (0)3 9345 2628
> > >> f: +61 (0)3 9347 0852
> > >> ------------------------------
>
> > ------------------------------
> > Mark Robinson
> > Epigenetics Laboratory, Garvan
> > Bioinformatics Division, WEHI
> > e: [email protected]
> > e: [email protected]
> > p: +61 (0)3 9345 2628
> > f: +61 (0)3 9347 0852
> > ------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest
version of the package, 2) to report the output of sessionInfo() and
traceback(), and 3) to post a complete code example.
You received this message because you are subscribed to the Google Groups
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---