Hi Ivan,

I'm not yet so familiar with iRanges, just starting to use it. By the way, I use the opportunity to thank the guys behing those libraries (ShortRead, chipseq, iRanges, etc.) as they are doing a tremendous work. Chapeau bas messieurs! Back to your question, the fact that A and B have different size is not a problem for genomeIntervals. The interval_overlap function will return a list, which for every interval of A will have a vector of the position in B of the corresponding overlapping interval. I agree, this is confusing, so here is the bit of code applied to for your example:

require(genomeIntervals)

A.set<-new("Genome_intervals",
                matrix(
                        c(3660781,3662707,
                        4481742,4482656,
                        4482813,4484003,
                        4561320,4562262,
                        4774887,4776304,
                        4797291,4798822,
                        4847807,4848846,
                        5008093,5009386,
                        5009514,5010046,
                        5010095,5010583),ncol=2,byrow=TRUE),
                closed=c(TRUE,TRUE),
                annotation=data.frame(
                seq_name=rep("chr1",10),
                inter_base=FALSE,
                strand="+"
                )
           )

B.set<-new("Genome_intervals",
                matrix(
                        c(
                         3659579,3662079,
                          4773791,4776291,
                           4797473,4799973,
                            4847394,4849894,
                             5007460,5009960,
                              5072753,5075253,
                               6204242,6206742,
                                7078730,7081230,
                                 9282452,9284952,
                                  9683423,9685923),ncol=2,byrow=TRUE),
                closed=c(TRUE,TRUE),
                annotation=data.frame(
                seq_name=rep("chr1",10),
                inter_base=FALSE,
                strand="+"
                )
           )

A.B.overlap<-interval_overlap(A.set,B.set)

A.B.overlap

[[1]]
[1] 1

[[2]]
integer(0)

[[3]]
integer(0)

[[4]]
integer(0)

[[5]]
[1] 2

[[6]]
[1] 3

[[7]]
[1] 4

[[8]]
[1] 5

[[9]]
[1] 5

[[10]]
integer(0)

As you can see the first interval in A overlap with the first in B; the fifth in A with the second in B and so on...

The comment from Steve are very good, I've read about it quite often in the literature, especially for ChIP-chip but I don't know if there's a package around. Let me know if you find/know about one.

Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: [email protected]
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------



On 7 May 2009, at 16:02, Ivan Gregoretti wrote:

Hello Steve, Nicolas and Michael,

I agree with all of you: it is not a trivial question.

I asked the bioc-sig-seq listers because I thought, --Hey, this must
be the everyday's question of the genome analyst.

Say you ran your chipseq under condition A and then you ran it under
condition B. Then you have to decide whether A and B made any
difference. It doesn't get any simpler than that!

I can't compare the two means or the two dispersions. I have to
compare pairs. The problem is that it is not trivial to unambiguously
determine which spot in B must be paired with each spot in A. To start
with, A and B may have different numbers of loci (ie 15000 versus
18000).

I'll take a look at genomeIntervals and IRanges.

By the way, Michael, would you let me know as soon as the new IRanges
documentation comes out? You guys were working on something, I
understand.

Thank you all,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence <[email protected]> wrote:


On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti <[email protected]> wrote:

Hello Bioc-sig-seq,

Say you run your ChIP-seq and find binding positions like this

chr1  3660781  3662707
chr1  4481742  4482656
chr1  4482813  4484003
chr1  4561320  4562262
chr1  4774887  4776304
chr1  4797291  4798822
chr1     4847807  4848846
chr1  5008093  5009386
chr1  5009514  5010046
chr1  5010095  5010583
...[many more loci and chromosomes]...

Then you want to compare it to published data like this

chr1  3659579  3662079
chr1  4773791  4776291
chr1  4797473  4799973
chr1  4847394  4849894
chr1  5007460  5009960
chr1  5072753  5075253
chr1  6204242  6206742
chr1  7078730  7081230
chr1  9282452  9284952
chr1  9683423  9685923
...[many more loci and chromosomes]...

What method would you use to test whether these two lists are
significantly different?

This is a tough statistical question that probably needs to be a bit more specific, but as far as technical tools, in addition to genomeIntervals there is the IRanges package and its efficient "overlap" function. IRanges is well integrated with the rest of sequence analysis infrastructure in
Bioconductor.


Any pointer would be appreciated.

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to