Re: [Bioc-sig-seq] Supplying own variance functions and adjusted counts to a DESeq dataset

Simon Anders Sat, 16 Jul 2011 04:00:58 -0700

[repost, as my original post of yesterday somehow got dropped by themailing list manager]


Hi Sean


On 2011-07-14 21:54, Sean Ruddy wrote:

I have a RNA-Seq count data set that requires separate offset values for
each tag and sample. DESeq does not appear to take a matrix of offset values
(unlike edgeR) in any of its functions so I've carried out the analysis
manually, ie. calculating a size factor for each tag of each sample,
adjusting the counts, then proceeding to calculate means and variances of
the adjusted counts, and finally fitting a curve for each condition to the
mean-var plot using locfit().

Essentially, I'd like to put these variance functions (or at least all the
predicted variances) and adjusted counts inside a DESeq object so that I can
take advantage of the other functions DESeq offers, tests, plots, etc...

We refactored thing a bit in the devel version, and it is now easier toinject your own variance estimates.

If you now run 'estimateDispersions', it adds columns 'disp_<cond>'(where <cond> is the name a condition, or "pooled" or "blind", dependingon the "method" argument) to the feature data slot. If you want to useyour own dispersion estimation scheme, you can just put values there,and the testing functions will use them.

However, I understand that you are actually happy with the estimation,you just want to pass gene-specific size factors, presumably to correctfor GC biases. Our planned next step in our refactoring effort was tooffer a slot, where you would pass a matrix of values, of the samedimensions as the count table, wich will be multiplied by the sizefactors each time they are used. From your post, I learned that theedgeR authors were again faster then we ;-) and have already added sucha feature. As demand for this will increase (e.g. to interface to thenew 'cqn' package that Hansen, Irizarry and Wu announced in their recentpreprint), we should better add it, too, I guess.

Until then, have a look at the source code of DESeq: You will noticethat we separated well the interface functions that deal with theCountDataSet objects, and the calculation functions that just work onmatrices. So, if you want to use a functionality that should be therebut is hard to use due to the format of the CountDataSet object, you cantypically call the core function directly. For example, the function'estimateAndFitDispersionsFromBaseMeansAndVariances' takes a list ofmean and dispersion and returns a mean-dispersion fit.


  Simon

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Supplying own variance functions and adjusted counts to a DESeq dataset

Reply via email to