Re: [Bioc-sig-seq] Comparing two chipseq position sets

Nicolas Delhomme Thu, 07 May 2009 08:50:53 -0700

Hi Ivan,

I'm not yet so familiar with iRanges, just starting to use it. By theway, I use the opportunity to thank the guys behing those libraries(ShortRead, chipseq, iRanges, etc.) as they are doing a tremendouswork. Chapeau bas messieurs!Back to your question, the fact that A and B have different size isnot a problem for genomeIntervals. The interval_overlap function willreturn a list, which for every interval of A will have a vector of theposition in B of the corresponding overlapping interval.I agree, this is confusing, so here is the bit of code applied to foryour example:


require(genomeIntervals)

A.set<-new("Genome_intervals",
                matrix(
                        c(3660781,3662707,
                        4481742,4482656,
                        4482813,4484003,
                        4561320,4562262,
                        4774887,4776304,
                        4797291,4798822,
                        4847807,4848846,
                        5008093,5009386,
                        5009514,5010046,
                        5010095,5010583),ncol=2,byrow=TRUE),
                closed=c(TRUE,TRUE),
                annotation=data.frame(
                seq_name=rep("chr1",10),
                inter_base=FALSE,
                strand="+"
                )
           )

B.set<-new("Genome_intervals",
                matrix(
                        c(
                         3659579,3662079,
                          4773791,4776291,
                           4797473,4799973,
                            4847394,4849894,
                             5007460,5009960,
                              5072753,5075253,
                               6204242,6206742,
                                7078730,7081230,
                                 9282452,9284952,
                                  9683423,9685923),ncol=2,byrow=TRUE),
                closed=c(TRUE,TRUE),
                annotation=data.frame(
                seq_name=rep("chr1",10),
                inter_base=FALSE,
                strand="+"
                )
           )

A.B.overlap<-interval_overlap(A.set,B.set)

A.B.overlap

[[1]]
[1] 1

[[2]]
integer(0)

[[3]]
integer(0)

[[4]]
integer(0)

[[5]]
[1] 2

[[6]]
[1] 3

[[7]]
[1] 4

[[8]]
[1] 5

[[9]]
[1] 5

[[10]]
integer(0)

As you can see the first interval in A overlap with the first in B;the fifth in A with the second in B and so on...

The comment from Steve are very good, I've read about it quite oftenin the literature, especially for ChIP-chip but I don't know ifthere's a package around. Let me know if you find/know about one.


Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: [email protected]
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------



On 7 May 2009, at 16:02, Ivan Gregoretti wrote:

Hello Steve, Nicolas and Michael,

I agree with all of you: it is not a trivial question.

I asked the bioc-sig-seq listers because I thought, --Hey, this must
be the everyday's question of the genome analyst.

Say you ran your chipseq under condition A and then you ran it under
condition B. Then you have to decide whether A and B made any
difference. It doesn't get any simpler than that!

I can't compare the two means or the two dispersions. I have to
compare pairs. The problem is that it is not trivial to unambiguously
determine which spot in B must be paired with each spot in A. To start
with, A and B may have different numbers of loci (ie 15000 versus
18000).

I'll take a look at genomeIntervals and IRanges.

By the way, Michael, would you let me know as soon as the new IRanges
documentation comes out? You guys were working on something, I
understand.

Thank you all,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence<[email protected]> wrote:

On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti<[email protected]> wrote:


Hello Bioc-sig-seq,

Say you run your ChIP-seq and find binding positions like this

chr1  3660781  3662707
chr1  4481742  4482656
chr1  4482813  4484003
chr1  4561320  4562262
chr1  4774887  4776304
chr1  4797291  4798822
chr1     4847807  4848846
chr1  5008093  5009386
chr1  5009514  5010046
chr1  5010095  5010583
...[many more loci and chromosomes]...

Then you want to compare it to published data like this

chr1  3659579  3662079
chr1  4773791  4776291
chr1  4797473  4799973
chr1  4847394  4849894
chr1  5007460  5009960
chr1  5072753  5075253
chr1  6204242  6206742
chr1  7078730  7081230
chr1  9282452  9284952
chr1  9683423  9685923
...[many more loci and chromosomes]...

What method would you use to test whether these two lists are
significantly different?

This is a tough statistical question that probably needs to be abit morespecific, but as far as technical tools, in addition togenomeIntervalsthere is the IRanges package and its efficient "overlap" function.IRangesis well integrated with the rest of sequence analysisinfrastructure in

Bioconductor.


Any pointer would be appreciated.

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Comparing two chipseq position sets

Reply via email to