Re: [Bioc-sig-seq] chipseq infrastructure

Raphael Gottardo Tue, 02 Mar 2010 09:31:44 -0800

Hi Michael (and others),

I would certainly second that. You guys have develop great tools for low level 
analysis of next gen data, but higher level analysis are still lagging behind. 
Though, this is rather normal as the higher level stuff needs the lower level 
infrastructure.

My group has been working on several aspects of chip-seq analysis and to some 
extend gene regulation. 
As noted in one of the email this morning, we are about to submit our PICS 
software based on a version of this paper http://arxiv.org/abs/0903.3206, which 
we hope will be published in Biometrics in the near future. For our package we 
have used some of the infrastructure available in the chip-seq package, and 
IRanges. 

One the problem we have faced is data input. In chipseq, one does not need 
sequence reads. However, when you use ShortReads you automatically get the 
sequence reads which takes a lot of memory. For some highly sequenced data we 
have, it has been somewhat of a bottleneck.
So it would be nice to be able to only read the chr/start/strand information. 
As pointed out by Wolfgang, rsamtools might be the solution, so we will have to 
see how we can use rsamtools and the classes defined there for chip-seq. This 
being said we still have a lot of files from non MAQ aligners.
I think Arnaud Droit, who is in my group, has sent an email about this issue 
already.

Besides PICS that will be submitted this week, we have already released a 
package for motif analyses, rGADEM, which can work on standard Biostrings 
objects. rGADEM is relatively fast and well adapted for ChIP-seq enriched 
regions. We also have another package, MotIV for motif validation and 
identification based which is based on STAMP (with many improved 
functionalities). MotIV is under review I believe and should be available soon.

Anyway, so very soon we will have a complete pipeline from shortread -> 
enriched regions (PICS) -> motifs (rGADEM) -> validated motifs and motif 
occurrences (MotIV) -> other BioC packages (e.g. GenomicsFeatures, etc).

So at least this will be a start. Of course we are open to 
suggestions/requests, etc. If any of you guys want more details feel free to 
drop us an email.

Cheers,

Raphael

On 2010-03-01, at 10:08 AM, Michael Lawrence wrote:

> Hey guys,
> 
> I'm wondering if anyone has given any thought to some sort of generic
> framework for chipseq analysis in Bioconductor, based on the IRanges,
> Biostrings, etc infrastructure. chipseq has some nice utilities; could it be
> transformed into some sort of generic chipseq pipeline? Something like how
> the 'affy' package (I think?) allows other packages to provide alternative
> implementations for particular stages. Just having a clean, refined,
> approximately complete set of chipseq-focused utilities would be nice.
> Presumably chipseq could fill that role? I think we now have a good idea of
> the basic steps in chipseq analysis, so it's probably time for such a
> package to emerge.
> 
> Comments?
> 
> Michael
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] chipseq infrastructure

Reply via email to