[repost, as my original post of yesterday somehow got dropped by the mailing list manager]

Hi Sean

On 2011-07-14 21:54, Sean Ruddy wrote:
I have a RNA-Seq count data set that requires separate offset values for
each tag and sample. DESeq does not appear to take a matrix of offset values
(unlike edgeR) in any of its functions so I've carried out the analysis
manually, ie. calculating a size factor for each tag of each sample,
adjusting the counts, then proceeding to calculate means and variances of
the adjusted counts, and finally fitting a curve for each condition to the
mean-var plot using locfit().

Essentially, I'd like to put these variance functions (or at least all the
predicted variances) and adjusted counts inside a DESeq object so that I can
take advantage of the other functions DESeq offers, tests, plots, etc...

We refactored thing a bit in the devel version, and it is now easier to inject your own variance estimates.

If you now run 'estimateDispersions', it adds columns 'disp_<cond>' (where <cond> is the name a condition, or "pooled" or "blind", depending on the "method" argument) to the feature data slot. If you want to use your own dispersion estimation scheme, you can just put values there, and the testing functions will use them.

However, I understand that you are actually happy with the estimation, you just want to pass gene-specific size factors, presumably to correct for GC biases. Our planned next step in our refactoring effort was to offer a slot, where you would pass a matrix of values, of the same dimensions as the count table, wich will be multiplied by the size factors each time they are used. From your post, I learned that the edgeR authors were again faster then we ;-) and have already added such a feature. As demand for this will increase (e.g. to interface to the new 'cqn' package that Hansen, Irizarry and Wu announced in their recent preprint), we should better add it, too, I guess.

Until then, have a look at the source code of DESeq: You will notice that we separated well the interface functions that deal with the CountDataSet objects, and the calculation functions that just work on matrices. So, if you want to use a functionality that should be there but is hard to use due to the format of the CountDataSet object, you can typically call the core function directly. For example, the function 'estimateAndFitDispersionsFromBaseMeansAndVariances' takes a list of mean and dispersion and returns a mean-dispersion fit.

  Simon

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to