Hello Steve, Nicolas and Michael, I agree with all of you: it is not a trivial question.
I asked the bioc-sig-seq listers because I thought, --Hey, this must be the everyday's question of the genome analyst. Say you ran your chipseq under condition A and then you ran it under condition B. Then you have to decide whether A and B made any difference. It doesn't get any simpler than that! I can't compare the two means or the two dispersions. I have to compare pairs. The problem is that it is not trivial to unambiguously determine which spot in B must be paired with each spot in A. To start with, A and B may have different numbers of loci (ie 15000 versus 18000). I'll take a look at genomeIntervals and IRanges. By the way, Michael, would you let me know as soon as the new IRanges documentation comes out? You guys were working on something, I understand. Thank you all, Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1592 Fax: 1-301-496-9878 On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence <[email protected]> wrote: > > > On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti <[email protected]> wrote: >> >> Hello Bioc-sig-seq, >> >> Say you run your ChIP-seq and find binding positions like this >> >> chr1 3660781 3662707 >> chr1 4481742 4482656 >> chr1 4482813 4484003 >> chr1 4561320 4562262 >> chr1 4774887 4776304 >> chr1 4797291 4798822 >> chr1 4847807 4848846 >> chr1 5008093 5009386 >> chr1 5009514 5010046 >> chr1 5010095 5010583 >> ...[many more loci and chromosomes]... >> >> Then you want to compare it to published data like this >> >> chr1 3659579 3662079 >> chr1 4773791 4776291 >> chr1 4797473 4799973 >> chr1 4847394 4849894 >> chr1 5007460 5009960 >> chr1 5072753 5075253 >> chr1 6204242 6206742 >> chr1 7078730 7081230 >> chr1 9282452 9284952 >> chr1 9683423 9685923 >> ...[many more loci and chromosomes]... >> >> What method would you use to test whether these two lists are >> significantly different? > > This is a tough statistical question that probably needs to be a bit more > specific, but as far as technical tools, in addition to genomeIntervals > there is the IRanges package and its efficient "overlap" function. IRanges > is well integrated with the rest of sequence analysis infrastructure in > Bioconductor. > >> >> Any pointer would be appreciated. >> >> Ivan >> >> Ivan Gregoretti, PhD >> National Institute of Diabetes and Digestive and Kidney Diseases >> National Institutes of Health >> 5 Memorial Dr, Building 5, Room 205. >> Bethesda, MD 20892. USA. >> Phone: 1-301-496-1592 >> Fax: 1-301-496-9878 >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
