Can someone explain to me the standards for denoting intervals in UCSC BED format? I'm struggling with what seem to me to be inconsistencies:
1. From the UCSC FAQ describing the BED format (http://genome.ucsc.edu/FAQ/FAQformat#format1 ), it sounds as if intervals are LEFT-CLOSED/RIGHT-OPEN. For instance, a feature spanning bases 0-99 (inclusively) is denoted chromStart 0, chromEnd 100. "The first three required BED fields are: chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671). chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0. chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99." 2. On the other hand when I download lists of exon start/stop positions from hg18>UCSC Genes>knownGene>Exons in BED format, the resulting intervals appear to be the opposite: LEFT-OPEN, RIGHT- CLOSED. Here is a representative entry. Looking at the browser, the actual exon encompasses bases 2476-2584: chr1 2475 2584 uc001aaa.2_exon_1_0_chr1_2476_f 0 + 3. I wondered if it was a strand issue, but here is an entry on the - strand, which is also LEFT-OPEN, RIGHT-CLOSED. chr1 4832 4901 uc001aab.2_exon_1_0_chr1_4833_r 0 - 4. Finally, what convention does dbSNP use? Many seem to use closed interval notation, eg rs5887673, listed at chr7:133624600-133624603. Thanks for any help. Tim _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
