On Mon, Mar 1, 2010 at 4:46 PM, Raphael Gottardo <[email protected]>wrote:
> Hi Michael (and others), > > I would certainly second that. You guys have develop great tools for low > level analysis of next gen data, but higher level analysis are still lagging > behind. Though, this is rather normal as the higher level stuff needs the > lower level infrastructure. > > My group has been working on several aspects of chip-seq analysis and to > some extend gene regulation. > As noted in one of the email this morning, we are about to submit our PICS > software based on a version of this paper http://arxiv.org/abs/0903.3206, > which we hope will be published in Biometrics in the near future. For our > package we have used some of the infrastructure available in the chip-seq > package, and IRanges. > > One the problem we have faced is data input. In chipseq, one does not need > sequence reads. However, when you use ShortReads you automatically get the > sequence reads which takes a lot of memory. For some highly sequenced data > we have, it has been somewhat of a bottleneck. > So it would be nice to be able to only read the chr/start/strand > information. As pointed out by Wolfgang, rsamtools might be the solution, so > we will have to see how we can use rsamtools and the classes defined there > for chip-seq. This being said we still have a lot of files from non MAQ > aligners. > I think Arnaud Droit, who is in my group, has sent an email about this > issue already. > > Besides PICS that will be submitted this week, we have already released a > package for motif analyses, rGADEM, which can work on standard Biostrings > objects. rGADEM is relatively fast and well adapted for ChIP-seq enriched > regions. We also have another package, MotIV for motif validation and > identification based which is based on STAMP (with many improved > functionalities). MotIV is under review I believe and should be available > soon. > > Anyway, so very soon we will have a complete pipeline from shortread -> > enriched regions (PICS) -> motifs (rGADEM) -> validated motifs and motif > occurrences (MotIV) -> other BioC packages (e.g. GenomicsFeatures, etc). > > This all sounds exciting. I want to clarify though that I am not proposing Bioconductor adopting a particular pipeline/methods, rather that we provide an extra layer of infrastructure to make developing pipelines easier. > So at least this will be a start. Of course we are open to > suggestions/requests, etc. If any of you guys want more details feel free to > drop us an email. > > Cheers, > > Raphael > > On 2010-03-01, at 10:08 AM, Michael Lawrence wrote: > > > Hey guys, > > > > I'm wondering if anyone has given any thought to some sort of generic > > framework for chipseq analysis in Bioconductor, based on the IRanges, > > Biostrings, etc infrastructure. chipseq has some nice utilities; could it > be > > transformed into some sort of generic chipseq pipeline? Something like > how > > the 'affy' package (I think?) allows other packages to provide > alternative > > implementations for particular stages. Just having a clean, refined, > > approximately complete set of chipseq-focused utilities would be nice. > > Presumably chipseq could fill that role? I think we now have a good idea > of > > the basic steps in chipseq analysis, so it's probably time for such a > > package to emerge. > > > > Comments? > > > > Michael > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > [email protected] > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
