On Thu, May 7, 2009 at 11:26 AM, Steve Lianoglou <
[email protected]> wrote:

> Hi,
>
> On May 7, 2009, at 10:53 AM, Steve Goldstein wrote:
>
>> A simple permutation test could be done by selecting random sets of
>> intervals "matching" the query intervals and counting the number of overlaps
>> with the reference intervals.  Each random set of intervals could be picked
>> so that the number and size of the intervals was the same as the query.   A
>> general implementation of the method would need to know the length of each
>> chromosome.
>>
>
> I was just about to suggest something similar, though I didn't think to
> consider chromosome length ... can you give some intuition as to why that's
> important for this question?
>
> I guess you'd expect more "collisions" to happen at random on a chromosome
> if it's longer, but I think in general one wouldn't be interested in finding
> the number of reads that "collide" between two experiments for a particular
> chromosome as you might be interested in just seeing how many collisions
> happen over the entire extent of the genome ... so is it helpful to think of
> the genome as broken up into chromosome-pieces, or would it suffice to
> simply think of it as being one contiguous length of sequence for this
> purpose?
>

Just a big word of caution when thinking about these issues--the genome is
not "flat".  The "mappability" of reads to the genome is a well-understood
reason for this non-randomness.  However, there are clearly other sources
that make this problem much trickier than it might first seem; these other
sources depend (at least) on the sample preparation and the experimental
details.  In short, choosing random intervals is almost certainly not going
to represent the null distribution.

Sean


>
>
>  Of course, if the null hypothesis for this permutation test (the sets
>> intervals are not related) is rejected, then you have to think about the
>> next questions:  To what degree are the set related?
>>
>
> I'm picturing the aligned reads as painting small lines on a large canvas
> (the entire canvas is the genome in this analogy).
>
> If your first expt is painting red lines.
> Your second expt is painting in blue lines.
> The question is how much of the canvas is purple.
>
> So it "feels" like some sort of an enrichment test to me, could you try to
> answer this question in a similar fashion to GO enrichment, via some sort of
> hypergeometric test? That's not exactly correct ... just brainstorming is
> all.
>
>  Where do they differ and where are they the same?
>>
>
> I'll stop with my speculation here ... :-)
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> http://cbio.mskcc.org/~lianos <http://cbio.mskcc.org/%7Elianos>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to