Hi Tim,
To add to Hiram's answer, the coordinate system depends on what you're
looking at. There are 4 different combinations of {UCSC vs. dbSNP,
web page vs. database table} and *3* different coordinate systems to
be aware of because dbSNP have come up with their own, 0-based fully
closed, in their database tables.
Here's the breakdown:
UCSC details page Position: 1-based, fully closed
("chr7:133624600-133624603")
UCSC database table snp130: 0-based, fully closed
("chr7 133624599 133624603")
dbSNP details page, GeneView section: 1-based, fully closed
(http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs5887673
"7 133624600:133624603")
(Caveat: sometimes the GeneView section uses coords from an alternate
assembly like Celera -- in that case, we may not have that SNP or may
have different coords, because we show SNPs mapped to the reference
genome (NCBI36 etc) only.)
dbSNP database dump files: **0-based, fully closed**
(ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/b130_SNPContigLoc_36_3.bcp.gz)
If you are working with dbSNP's database dump files, add 1 to the end
coord to get the UCSC internal coordinate system, and then add 1 to
the start coord as well to get the 1-based, fully closed that is
printed out on web pages.
Hope that helps, and please send any further questions to us at
[email protected],
Angie
On Wed, 15 Jul 2009, Hiram Clawson wrote:
> Good Morning Tim:
>
> See also: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
>
> Beware, our MySQL tables are zero-based, half-open, but when displaying
> intervals on the genome browser all coordinates are in more human
> friendly terms of one-based closed. SNPs can be neither since the
> actual SNP may not exist in the reference sequence and its coordinates
> end up with end == start seemingly a zero length item.
>
> --Hiram
>
> Tim Yu wrote:
> > Can someone explain to me the standards for denoting intervals in UCSC
> > BED format? I'm struggling with what seem to me to be inconsistencies:
> >
> > 1. From the UCSC FAQ describing the BED format
> > (http://genome.ucsc.edu/FAQ/FAQformat#format1
> > ), it sounds as if intervals are LEFT-CLOSED/RIGHT-OPEN. For
> > instance, a feature spanning bases 0-99 (inclusively) is denoted
> > chromStart 0, chromEnd 100.
> > "The first three required BED fields are:
> >
> > chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or
> > scaffold (e.g. scaffold10671).
> > chromStart - The starting position of the feature in the chromosome or
> > scaffold. The first base in a chromosome is numbered 0.
> > chromEnd - The ending position of the feature in the chromosome or
> > scaffold. The chromEnd base is not included in the display of the
> > feature. For example, the first 100 bases of a chromosome are defined
> > as chromStart=0, chromEnd=100, and span the bases numbered 0-99."
> > 2. On the other hand when I download lists of exon start/stop
> > positions from hg18>UCSC Genes>knownGene>Exons in BED format, the
> > resulting intervals appear to be the opposite: LEFT-OPEN, RIGHT-
> > CLOSED. Here is a representative entry. Looking at the browser, the
> > actual exon encompasses bases 2476-2584:
> > chr1 2475 2584 uc001aaa.2_exon_1_0_chr1_2476_f 0 +
> >
> > 3. I wondered if it was a strand issue, but here is an entry on the -
> > strand, which is also LEFT-OPEN, RIGHT-CLOSED.
> > chr1 4832 4901 uc001aab.2_exon_1_0_chr1_4833_r 0 -
> >
> > 4. Finally, what convention does dbSNP use? Many seem to use closed
> > interval notation, eg rs5887673, listed at chr7:133624600-133624603.
> >
> > Thanks for any help.
> >
> > Tim
> > _______________________________________________
> > Genome maillist - [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >
>
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome