Re: [Genome] Notation standards: open vs closed intervals, BED format, UCSC Exons, and dbSNP

Angie Hinrichs Wed, 15 Jul 2009 10:52:54 -0700

Hi Tim,

To add to Hiram's answer, the coordinate system depends on what you're
looking at.  There are 4 different combinations of {UCSC vs. dbSNP,
web page vs. database table} and *3* different coordinate systems to
be aware of because dbSNP have come up with their own, 0-based fully
closed, in their database tables.


Here's the breakdown:

UCSC details page Position: 1-based, fully closed
("chr7:133624600-133624603")

UCSC database table snp130: 0-based, fully closed
("chr7    133624599       133624603")

dbSNP details page, GeneView section: 1-based, fully closed
(http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs5887673
 "7   133624600:133624603")
(Caveat: sometimes the GeneView section uses coords from an alternate  
assembly like Celera -- in that case, we may not have that SNP or may
have different coords, because we show SNPs mapped to the reference
genome (NCBI36 etc) only.)

dbSNP database dump files: **0-based, fully closed**
(ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/b130_SNPContigLoc_36_3.bcp.gz)

If you are working with dbSNP's database dump files, add 1 to the end 
coord to get the UCSC internal coordinate system, and then add 1 to 
the start coord as well to get the 1-based, fully closed that is 
printed out on web pages.

Hope that helps, and please send any further questions to us at
[email protected],

Angie

On Wed, 15 Jul 2009, Hiram Clawson wrote:

> Good Morning Tim:
> 
> See also: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
> 
> Beware, our MySQL tables are zero-based, half-open, but when displaying
> intervals on the genome browser all coordinates are in more human
> friendly terms of one-based closed.  SNPs can be neither since the
> actual SNP may not exist in the reference sequence and its coordinates
> end up with end == start seemingly a zero length item.
> 
> --Hiram
> 
> Tim Yu wrote:
> > Can someone explain to me the standards for denoting intervals in UCSC  
> > BED format?  I'm struggling with what seem to me to be inconsistencies:
> > 
> > 1. From the UCSC FAQ describing the BED format 
> > (http://genome.ucsc.edu/FAQ/FAQformat#format1 
> > ), it sounds as if intervals are LEFT-CLOSED/RIGHT-OPEN.  For  
> > instance, a feature spanning bases 0-99 (inclusively) is denoted  
> > chromStart 0, chromEnd 100.
> > "The first three required BED fields are:
> > 
> > chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or  
> > scaffold (e.g. scaffold10671).
> > chromStart - The starting position of the feature in the chromosome or  
> > scaffold. The first base in a chromosome is numbered 0.
> > chromEnd - The ending position of the feature in the chromosome or  
> > scaffold. The chromEnd base is not included in the display of the  
> > feature. For example, the first 100 bases of a chromosome are defined  
> > as chromStart=0, chromEnd=100, and span the bases numbered 0-99."
> > 2. On the other hand when I download lists of exon start/stop  
> > positions from hg18>UCSC Genes>knownGene>Exons in BED format, the  
> > resulting intervals appear to be the opposite: LEFT-OPEN, RIGHT- 
> > CLOSED.  Here is a representative entry. Looking at the browser, the  
> > actual exon encompasses bases 2476-2584:
> > chr1        2475    2584    uc001aaa.2_exon_1_0_chr1_2476_f 0       +
> > 
> > 3. I wondered if it was a strand issue, but here is an entry on the -  
> > strand, which is also LEFT-OPEN, RIGHT-CLOSED.
> > chr1        4832    4901    uc001aab.2_exon_1_0_chr1_4833_r 0       -
> > 
> > 4. Finally, what convention does dbSNP use?  Many seem to use closed  
> > interval notation, eg rs5887673, listed at chr7:133624600-133624603.
> > 
> > Thanks for any help.
> > 
> > Tim
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> > 
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Notation standards: open vs closed intervals, BED format, UCSC Exons, and dbSNP

Reply via email to