On Mon, Mar 1, 2010 at 7:08 AM, Michael Lawrence <[email protected]> wrote: > Hey guys, > > I'm wondering if anyone has given any thought to some sort of generic > framework for chipseq analysis in Bioconductor, based on the IRanges, > Biostrings, etc infrastructure. chipseq has some nice utilities; could it be > transformed into some sort of generic chipseq pipeline? Something like how > the 'affy' package (I think?) allows other packages to provide alternative > implementations for particular stages. Just having a clean, refined, > approximately complete set of chipseq-focused utilities would be nice. > Presumably chipseq could fill that role? I think we now have a good idea of > the basic steps in chipseq analysis, so it's probably time for such a > package to emerge. > > Comments?
Good idea of course, but will need thought. We should probably start with identifying typical stages of the analysis, and formulating suitable data structures. What we have now is: - Data I/O and QA: External software + ShortRead - Data reduction: Is "GenomeDataList" good, or do we want something else as an intermediate on-disk storage format? - Modeling + Peak Calling: Is coverage the right abstraction? We have one method based on coverage, but not all methods are. I'm also not sure how much of this can be put into a framework. For example, it's not clear how genomic annotation can be incorporated. One can call peaks and then "intersect" with promoter regions, or bypass peak-calling and start directly with promoter regions. In the chipseq package, we basically gave up trying to formalize this, and made it free-for-all after the data reduction step. I'm not sure we can do better unless we restrict to specific pipelines. -Deepayan _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
