Hi Greg, As you indicated in another email, we recommend removing SNPs that map to multiple locations to help increase accuracy in mapping. One of our developers adds that "duplicated regions of the genome are difficult to assemble and might be more likely to change from one assembly to the next than unique regions. Also, I believe dbSNP did not remap SNP flanking sequences to GRCh37/hg19, but did their own coordinate translation of NCBI36/hg18 snp130 mappings to GRCh37 and possibly some filtering." Thus, the liftOver file may not be as of high quality for these regions and dbSNP's process is another variable.
She also added "As of snp132, we are going to do more to separate out those multiply-mapped SNPs because they cause trouble in all sorts of analyses. The 1000Genomes pilot project has identified genomic regions (in hg18) that are not unique enough for the alignment tools to be very confident that they have identified the right match, and, anecdotally, I've seen a lot of multiply-mapped SNPs in or next to those regions." I hope this information is helpful. Please feel free to contact the mail list again if you require further assistance. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 1/11/11 7:49 AM, Gregory Dougherty wrote: > We are evaluating the liftOver tool for converting SNPs, that our researchers > have found, from hg18 to hg19 (IOW, not ones that are in dbSNPs). As a test > I ran the hg18 dbSNPs 130 through liftOver, then compared the results to the > hg19 dbSNPs 130. > > My results: > Starting: 18,833,531 > Converted by liftOver: 15,423,712 > Match Official position: 11,788,968 > Failed to map: 3,409,819 > Mapped to "wrong" location: 3,634,744 > > 1: Should my experiment have worked? "Should" the liftOver tool have gotten > all the SNPs to their official hg19 locations? > 2: I'm not really worried about the SNPs that failed to map, but have ~20% of > the SNPs map to the wrong location is kind of concerning. What is your > expected error rate? > > Thank you, > > Greg > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
