[Bioc-sig-seq] edgeR tagwise estimates not converging to common estimate with large prior.n value

Gordon K Smyth Fri, 16 Sep 2011 17:48:19 -0700

Dear Sean,

The dispersion estimation functions in edgeR have a lower limit for thedispersions that they will estimate. For estimateCommonDisp(), the lowerlimit is just above 0.0001. For estimateTagwiseDisp() the lower limit isjust above 0.001. For your data, the ideal dispersion estimate appears tobe zero, so the functions are simply returning to you the pre-set lowerlimits.

I agree that was a bit sloppy of us (the edgeR authors) for the lowerlimits to be inconsistent between the functions. The reason forestimateTagwiseDisp() having a higher limit is that it does a grid search,so we wanted to limit the number of grid points for computationalefficiency.

The new glm functions in edgeR, estimateGLMCommonDisp() etc have somewhatless restrictive lower limits than the classic functions that you areusing.

The bottom line is that with technical data such as the yeast data, we donot view the differences between dispersion estimates of 1e-3 or 1e-4 asscientifically meaningful. We would simply observe that the dispersionappears to be at the lower boundary, showing that the data has essentiallyno biological variability. We would set the dispersions to be zero.


Best wishes
Gordon

Date: Thu, 15 Sep 2011 18:03:28 -0700
From: Sean Ruddy <srudd...@gmail.com>
To: bioc-sig-sequencing@r-project.org
Subject: [Bioc-sig-seq] edgeR tagwise estimates not converging to
        common estimate with large prior.n value

Hi,

Thanks in advance for any help. I have the latest R software (2.13.1) and
edgeR software (2.8.4). I'm running into a problem where I estimate a common
dispersion parameter of 0.0001 and when I subsequently estimate tagwise
dispersions using the default prior.n = 10, the summary statistics are

Min.  1st Qu.  Median    Mean    3rd Qu.    Max.
0.001  0.001      0.001     0.001     0.001      0.022

ie, all estimates are 10 times larger than the common dispersion estimate.
Since the method is supposed to shrink toward the common value this seems a
little surprising. When I increase prior.n to a large number I expect the
tagwise estimates to all converge to the common dispersion, but as you might
guess from the table above it converges to 0.001 = 10*common.

The data comes from the bioconductor package "yeastRNASeq" and it appears
from the description of the data that the two samples in each group are
actually from sequencing the same extraction of mRNA, ie not biological and
not even really technical replicates. So the common dispersion should be
zero as the counts should follow the poisson.

I cannot explain the behavior of the estimates but I'm afraid it might be
something in the code so I'll include that below.

library(yeastRNASeq)
data( geneLevelData )
d <- DGEList( geneLevelData , group = c( rep( "Mutant" , 2 ) , rep( "Wild" ,
2 ) ) )
d <- calcNormFactors( d )
d <- d[rowSums(d$counts) >= 5, ]
d <- estimateCommonDisp( d )

d$common.dispersion
[1] 0.000101

d <- estimateTagwiseDisp( d , prior.n = 10 )

summary( d$tagwise.dispersion )
 Min. 1st Qu.  Median    Mean  3rd Qu.    Max.
0.001  0.001     0.001      0.001  0.001     0.022

d <- estimateTagwiseDisp( d , prior.n = 1000 )

summary( d$tagwise.dispersion )
Min.    1st Qu.  Median    Mean   3rd Qu.    Max.
0.001   0.001     0.001      0.001   0.001     0.001

It could just be an oddity of the data set itself but I don't have enough
experience using edgeR across different RNA-Seq experiments to know how
these methods should behave.

Thanks,
Sean


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] edgeR tagwise estimates not converging to common estimate with large prior.n value

Reply via email to