Dear Sean,

The dispersion estimation functions in edgeR have a lower limit for the dispersions that they will estimate. For estimateCommonDisp(), the lower limit is just above 0.0001. For estimateTagwiseDisp() the lower limit is just above 0.001. For your data, the ideal dispersion estimate appears to be zero, so the functions are simply returning to you the pre-set lower limits.

I agree that was a bit sloppy of us (the edgeR authors) for the lower limits to be inconsistent between the functions. The reason for estimateTagwiseDisp() having a higher limit is that it does a grid search, so we wanted to limit the number of grid points for computational efficiency.

The new glm functions in edgeR, estimateGLMCommonDisp() etc have somewhat less restrictive lower limits than the classic functions that you are using.

The bottom line is that with technical data such as the yeast data, we do not view the differences between dispersion estimates of 1e-3 or 1e-4 as scientifically meaningful. We would simply observe that the dispersion appears to be at the lower boundary, showing that the data has essentially no biological variability. We would set the dispersions to be zero.

Best wishes
Gordon

Date: Thu, 15 Sep 2011 18:03:28 -0700
From: Sean Ruddy <srudd...@gmail.com>
To: bioc-sig-sequencing@r-project.org
Subject: [Bioc-sig-seq] edgeR tagwise estimates not converging to
        common estimate with large prior.n value

Hi,

Thanks in advance for any help. I have the latest R software (2.13.1) and
edgeR software (2.8.4). I'm running into a problem where I estimate a common
dispersion parameter of 0.0001 and when I subsequently estimate tagwise
dispersions using the default prior.n = 10, the summary statistics are

Min.  1st Qu.  Median    Mean    3rd Qu.    Max.
0.001  0.001      0.001     0.001     0.001      0.022

ie, all estimates are 10 times larger than the common dispersion estimate.
Since the method is supposed to shrink toward the common value this seems a
little surprising. When I increase prior.n to a large number I expect the
tagwise estimates to all converge to the common dispersion, but as you might
guess from the table above it converges to 0.001 = 10*common.

The data comes from the bioconductor package "yeastRNASeq" and it appears
from the description of the data that the two samples in each group are
actually from sequencing the same extraction of mRNA, ie not biological and
not even really technical replicates. So the common dispersion should be
zero as the counts should follow the poisson.

I cannot explain the behavior of the estimates but I'm afraid it might be
something in the code so I'll include that below.

library(yeastRNASeq)
data( geneLevelData )
d <- DGEList( geneLevelData , group = c( rep( "Mutant" , 2 ) , rep( "Wild" ,
2 ) ) )
d <- calcNormFactors( d )
d <- d[rowSums(d$counts) >= 5, ]
d <- estimateCommonDisp( d )

d$common.dispersion
[1] 0.000101

d <- estimateTagwiseDisp( d , prior.n = 10 )

summary( d$tagwise.dispersion )
 Min. 1st Qu.  Median    Mean  3rd Qu.    Max.
0.001  0.001     0.001      0.001  0.001     0.022

d <- estimateTagwiseDisp( d , prior.n = 1000 )

summary( d$tagwise.dispersion )
Min.    1st Qu.  Median    Mean   3rd Qu.    Max.
0.001   0.001     0.001      0.001   0.001     0.001


It could just be an oddity of the data set itself but I don't have enough
experience using edgeR across different RNA-Seq experiments to know how
these methods should behave.


Thanks,
Sean

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to