Hi Mary, Thank you for the clarification. I had actually sent a list of ca. 8,500 such duplicate SNPs to dbSNP team couple months ago and they promised to look into the issue. After build 131 was released, I found the number of duplicates has actually increased dramatically. I will make sure to contact them again. I was just hoping UCSC team may decide to act on its own because dbSNP has a long history of releasing poor quality and plain erroneous data. I don't want to blame them too much, dealing with the recent influx of resequencing data is an enormous challenge. But speaking from a user experience, working with unclean data on a multi-genome scale could be very frustrating. I am sure you a familiar with the feeling ;).
Best, Ivan On Monday, June 14, 2010 06:29:17 pm Mary Goldman wrote: > Hi Ivan, > > The only difference in the table between these two SNPs is the strand > orientation. We obtain the rsIDs and genomic locations directly from > dbSNP, who considers both of these SNPs valid > (http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs3871692 and > http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs62635282). > > Feel free to contact dbSNP at snp-admin at ncbi.nlm.nih.gov if you have > any questions about these SNPs. I hope this information is helpful. > Please feel free to contact the mail list again if you require further > assistance. > > Best, > Mary > --------------------- > Mary Goldman > UCSC Bioinformatics Group > > On 6/11/10 10:31 PM, Ivan Adzhubey wrote: > > Hi, > > > > I found about 3% of SNPs in hg19.snp131 table have exactly identical > > annotations (chromosome, position, observed alleles, etc) except for > > different rsIDs. An example is listed below. Is this a bug or a feature? > > > > Thanks, > > Ivan > > > > mysql> select * from snp131 where chrom='chr1' and chromStart=12197 and > > chromEnd=12198\G > > *************************** 1. row *************************** > > > > bin: 585 > > > > chrom: chr1 > > > > chromStart: 12197 > > > > chromEnd: 12198 > > > > name: rs3871692 > > > > score: 0 > > > > strand: - > > > > refNCBI: G > > refUCSC: G > > > > observed: C/G > > > > molType: genomic > > > > class: single > > valid: unknown > > avHet: 0 > > > > avHetSE: 0 > > > > func: missense > > > > locType: exact > > > > weight: 3 > > > > *************************** 2. row *************************** > > > > bin: 585 > > > > chrom: chr1 > > > > chromStart: 12197 > > > > chromEnd: 12198 > > > > name: rs62635282 > > > > score: 0 > > > > strand: + > > > > refNCBI: G > > refUCSC: G > > > > observed: C/G > > > > molType: genomic > > > > class: single > > valid: unknown > > avHet: 0 > > > > avHetSE: 0 > > > > func: missense > > > > locType: exact > > > > weight: 3 > > > > 2 rows in set (3.52 sec) > > _______________________________________________ > > Genome maillist [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
