P. Terry,
In your example, the GR_txdb object is the query, not the subject, in
the overlap calculation. The following code should get you what you are
looking for:
OL<- findOverlaps(GR_txdb, r_gr_ChSeqPks)
tdata<- GR_txdb[queryHits(OL),]
Patrick
On 4/5/10 11:02 AM, [email protected] wrote:
Dear bioc-sig-sequencing,
Working thru the GenomicFeatures vignette dated 03/27/10 with an Arabidopsis
example, I applied 'findOverlaps' to learn which chipseq peaks will overlap
with Arabidopsis transcrips. In attempting to manually verify the results from
'findOverlaps', none of the ranges in the chipSeq peaks list overlaps the
ranges from for example, the first record of the 'findOverlaps' output.
That is, the lowest address range in the chipSeq peaks (all chr1) list is
617092-617094. The 1st recond in the output from 'findOverlaps' is the
transcript, 28500-28706. Can someone offer an explanation of why
'findOverlaps' reports a transcript with this location as an overlap for any of
the peaks in the chipSeq peak list?
GR_txdb<- transcripts(mart4_at_eg_gene)
GR_txdb
GRanges with 39640 ranges and 2 elementMetadata values
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> |<integer> <character>
[1] 1 [ 3631, 5899] + | 5480 AT1G01010.1-TAIR
[2] 1 [23146, 31227] + | 3216 AT1G01040.1-TAIR
[3] 1 [28500, 28706] + | 8461 AT1G01046.1-TAIR
[4] 1 [44677, 44787] + | 3566 AT1G01073.1-TAIR
...
r_gr_ChSeqPks
GRanges with 57 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] 1 [ 617092, 617094] * |
[2] 1 [1808262, 1808262] * |
[3] 1 [3889445, 3889452] * |
[4] 1 [4404410, 4404410] * |
...
OL<- findOverlaps(GR_txdb, r_gr_ChSeqPks)
tdata<- GR_txdb[subjectHits(OL),]
tdata
GRanges with 43 ranges and 2 elementMetadata values
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> |<integer> <character>
[1] 1 [ 28500, 28706] + | 8461 AT1G01046.1-TAIR
[2] 1 [ 88898, 89745] + | 6886 AT1G01210.1-TAIR
[3] 1 [ 91750, 95651] + | 4400 AT1G01220.1-TAIR
[4] 1 [ 95987, 97407] + | 7355 AT1G01225.1-TAIR
...
length(tdata)
[1] 43
sessionInfo()
R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.3.5 GenomicFeatures_0.5.0 GenomicRanges_0.1.0
[4] IRanges_1.5.73
loaded via a namespace (and not attached):
[1] Biobase_2.7.5 Biostrings_2.15.26 BSgenome_1.15.20 DBI_0.2-5
[5] RCurl_1.3-1 RSQLite_0.8-4 rtracklayer_1.7.11 tools_2.12.0
[9] XML_2.8-1
Thanks,
P. Terry
[email protected]
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing