Dear Dr. Rhee, Thank you for reporting this discrepancy. The coordinates in snp131CodingDbSnp.txt are in NCBI's [0-based, fully-closed] coordinate system, while snp131.txt has the BED [0-based, half-open) coordinates. Most records can be fixed to match snp131.txt by adding 1 to the end coordinate of snp131CodingDbSnp.txt. However, NCBI uses 2-base-long records to represent point insertions (ncbiEnd = ncbiStart+1), while snp131 (BED) has 0-base-long records for those (chromEnd = chromStart). So unfortunately it is not just a 1-line awk script to fix the coordinates.
I will look into fixing the coordinates in snp131CodingDbSnp.txt. Thanks again for reporting this, and please contact the list again if you have more questions. Angie ----- "이환석" <[email protected]> wrote: > From: "이환석" <[email protected]> > To: [email protected] > Sent: Tuesday, September 28, 2010 5:14:08 PM GMT -08:00 US/Canada Pacific > Subject: [Genome] Question about snp131CodingDbSnp.txt > > Dear UCSC Bioinformatics Group, > > I found snp131CodingDbSnp.txt is not 0-based BED format > while snp131.txt is 0-based BED format. > > For example, in snp131.txt, rs62635282 is described as > chr1 12197 12198 rs62635282 ... However, in snp131CodingDbSnp.txt, > rs62635282 is described as chr1 12197 12197 rs62635282 ... > What's wrong with this differences? > > Best regards, > Hwanseok Rhee, Ph.D > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
