On 04/13/2010 12:35 PM, margherita mutarelli wrote:
Dear all,
please apologize if I missed this information, but I have looked throughout
the documentation and vignettes of the IRanges packages and I could not find
this information:
are the coordinates in IRanges objects considered as "0-indexed" or
"1-indexed"?
I.e. when importing the refGene.txt table (or any) from UCSC, we know that
they are 0-indexed, meaning that the first base is not part of the
gene/transcript/object.
If IRanges are 1-index this means we have to subtract 1 from the start
coordinate precedent in the table when creating an IRanges object from them.
Is it correct?
Hi Margherita
this topic always causes problems. As far as I understand the situation,
you have to add 1 to the start of the coordinates you have downloaded (I
assume a BED files) from UCSC.
Let me try and explain with a simple example:
we have two features ranging from 1 to 5 and 5 to 10. We can create
simple IRanges objects:
> f1 <- IRanges(c(1), c(5))
> f2 <- IRanges(c(5), c(10))
>
> f1
IRanges of length 1
start end width
[1] 1 5 5
> f2
IRanges of length 1
start end width
[1] 5 10 6
>
and of course, they do overlap:
> findOverlaps(f1,f2)
An object of class “RangesMatching”
Slot "matchMatrix":
query subject
[1,] 1 1
Slot "DIM":
[1] 1 1
>
Now let's assume we got these numbers from UCSC as part of a BED file
for S. cerevisiae, chromosome 11:
chrXI 1 5
chrXI 5 10
BED files are '0-based' and 'end exclusive' (see:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1
on the chromosome (with a '0-based' notation) this would look like
0 1 2 3 4 5 6 7 8 9 10
C A C C A C A C C C A
f1 * * * *
f2 * * * * *
=> they don't overlap!
play with the 'upload custom track' (using the small BED file from
above) tool on the UCSC genome browser in case this is stil confusing
Now back to IRanges (which are '1-based' and 'end inclusive')
1 2 3 4 5 6 7 8 9 10
C A C C A C A C C C A
f1 * * * *
f2 * * * * *
our new numbers are: 2 to 5 and 6 to 10 (which corresponds to adding 1
to the start before we create the IRanges object)
> ff1 <- IRanges(c(2), c(5))
> ff2 <- IRanges(c(6), c(10))
> findOverlaps(ff1,ff2)
An object of class “RangesMatching”
Slot "matchMatrix":
query subject
Slot "DIM":
[1] 1 1
>
=> they don't overlap.
I hope this helps
Hans
This can be important to clarify, both when considering overlap of features
and in junctions, since it can shift the correct exon boundaries.
Cheers,
Margherita
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing