On Thu, May 7, 2009 at 11:26 AM, Steve Lianoglou < [email protected]> wrote:
> Hi, > > On May 7, 2009, at 10:53 AM, Steve Goldstein wrote: > >> A simple permutation test could be done by selecting random sets of >> intervals "matching" the query intervals and counting the number of overlaps >> with the reference intervals. Each random set of intervals could be picked >> so that the number and size of the intervals was the same as the query. A >> general implementation of the method would need to know the length of each >> chromosome. >> > > I was just about to suggest something similar, though I didn't think to > consider chromosome length ... can you give some intuition as to why that's > important for this question? > > I guess you'd expect more "collisions" to happen at random on a chromosome > if it's longer, but I think in general one wouldn't be interested in finding > the number of reads that "collide" between two experiments for a particular > chromosome as you might be interested in just seeing how many collisions > happen over the entire extent of the genome ... so is it helpful to think of > the genome as broken up into chromosome-pieces, or would it suffice to > simply think of it as being one contiguous length of sequence for this > purpose? > Just a big word of caution when thinking about these issues--the genome is not "flat". The "mappability" of reads to the genome is a well-understood reason for this non-randomness. However, there are clearly other sources that make this problem much trickier than it might first seem; these other sources depend (at least) on the sample preparation and the experimental details. In short, choosing random intervals is almost certainly not going to represent the null distribution. Sean > > > Of course, if the null hypothesis for this permutation test (the sets >> intervals are not related) is rejected, then you have to think about the >> next questions: To what degree are the set related? >> > > I'm picturing the aligned reads as painting small lines on a large canvas > (the entire canvas is the genome in this analogy). > > If your first expt is painting red lines. > Your second expt is painting in blue lines. > The question is how much of the canvas is purple. > > So it "feels" like some sort of an enrichment test to me, could you try to > answer this question in a similar fashion to GO enrichment, via some sort of > hypergeometric test? That's not exactly correct ... just brainstorming is > all. > > Where do they differ and where are they the same? >> > > I'll stop with my speculation here ... :-) > > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > http://cbio.mskcc.org/~lianos <http://cbio.mskcc.org/%7Elianos> > > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
