I've loaded SNP130 into a local database (thank you very much for the data files, etc.) and have some questions about the data.
To start, my understanding is that chromosome positions are [start, end), i.e. from start (inclusive) to stop (exclusive). Or, to put it another way if start - 5 and end = 6, then you have a 1 bp feature at position 5. No? Because I got these results from some searches: mysql> select count(*) from snp130 where chromStart = chromEnd; +----------+ | count(*) | +----------+ | 2,632,502| +----------+ mysql> select count(*) from snp130 where chromStart = chromEnd - 1; +----------+ | count(*) | +----------+ |15,322,316| +----------+ The fact that you have roughly 6x SNPs where chromEnd - chromStart = 1 says to me that my understanding should be correct, but that leaves me wondering why there are 2.6 million "SNPs" that don't cover any bases. Also, IIRC, the first base of a chromosome is base 0, yes? TIA, Greg _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
