Hi Ivan,
I'm not yet so familiar with iRanges, just starting to use it. By the way, I
use the opportunity to thank the guys behing those libraries (ShortRead,
chipseq, iRanges, etc.) as they are doing a tremendous work. Chapeau bas
messieurs!
Back to your question, the fact that A and B have different size is not a
problem for genomeIntervals. The interval_overlap function will return a
list, which for every interval of A will have a vector of the position in B
of the corresponding overlapping interval.
I agree, this is confusing, so here is the bit of code applied to for your
example:
require(genomeIntervals)
A.set<-new("Genome_intervals",
matrix(
c(3660781,3662707,
4481742,4482656,
4482813,4484003,
4561320,4562262,
4774887,4776304,
4797291,4798822,
4847807,4848846,
5008093,5009386,
5009514,5010046,
5010095,5010583),ncol=2,byrow=TRUE),
closed=c(TRUE,TRUE),
annotation=data.frame(
seq_name=rep("chr1",10),
inter_base=FALSE,
strand="+"
)
)
B.set<-new("Genome_intervals",
matrix(
c(
3659579,3662079,
4773791,4776291,
4797473,4799973,
4847394,4849894,
5007460,5009960,
5072753,5075253,
6204242,6206742,
7078730,7081230,
9282452,9284952,
9683423,9685923),ncol=2,byrow=TRUE),
closed=c(TRUE,TRUE),
annotation=data.frame(
seq_name=rep("chr1",10),
inter_base=FALSE,
strand="+"
)
)
A.B.overlap<-interval_overlap(A.set,B.set)
A.B.overlap
[[1]]
[1] 1
[[2]]
integer(0)
[[3]]
integer(0)
[[4]]
integer(0)
[[5]]
[1] 2
[[6]]
[1] 3
[[7]]
[1] 4
[[8]]
[1] 5
[[9]]
[1] 5
[[10]]
integer(0)
As you can see the first interval in A overlap with the first in B; the
fifth in A with the second in B and so on...
The comment from Steve are very good, I've read about it quite often in the
literature, especially for ChIP-chip but I don't know if there's a package
around. Let me know if you find/know about one.
Best,
---------------------------------------------------------------
Nicolas Delhomme
High Throughput Functional Genomics Center
European Molecular Biology Laboratory
Tel: +49 6221 387 8426
Email: [email protected]
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On 7 May 2009, at 16:02, Ivan Gregoretti wrote:
Hello Steve, Nicolas and Michael,
I agree with all of you: it is not a trivial question.
I asked the bioc-sig-seq listers because I thought, --Hey, this must
be the everyday's question of the genome analyst.
Say you ran your chipseq under condition A and then you ran it under
condition B. Then you have to decide whether A and B made any
difference. It doesn't get any simpler than that!
I can't compare the two means or the two dispersions. I have to
compare pairs. The problem is that it is not trivial to unambiguously
determine which spot in B must be paired with each spot in A. To start
with, A and B may have different numbers of loci (ie 15000 versus
18000).
I'll take a look at genomeIntervals and IRanges.
By the way, Michael, would you let me know as soon as the new IRanges
documentation comes out? You guys were working on something, I
understand.
Thank you all,
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence <[email protected]>
wrote:
On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti <[email protected]>
wrote:
Hello Bioc-sig-seq,
Say you run your ChIP-seq and find binding positions like this
chr1 3660781 3662707
chr1 4481742 4482656
chr1 4482813 4484003
chr1 4561320 4562262
chr1 4774887 4776304
chr1 4797291 4798822
chr1 4847807 4848846
chr1 5008093 5009386
chr1 5009514 5010046
chr1 5010095 5010583
...[many more loci and chromosomes]...
Then you want to compare it to published data like this
chr1 3659579 3662079
chr1 4773791 4776291
chr1 4797473 4799973
chr1 4847394 4849894
chr1 5007460 5009960
chr1 5072753 5075253
chr1 6204242 6206742
chr1 7078730 7081230
chr1 9282452 9284952
chr1 9683423 9685923
...[many more loci and chromosomes]...
What method would you use to test whether these two lists are
significantly different?
This is a tough statistical question that probably needs to be a bit more
specific, but as far as technical tools, in addition to genomeIntervals
there is the IRanges package and its efficient "overlap" function.
IRanges
is well integrated with the rest of sequence analysis infrastructure in
Bioconductor.
Any pointer would be appreciated.
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing