Hi Xiang, The snp132Common track is a subset of snp132, so it makes sense that there are fewer HapMap SNPs there. (snp132Common contains uniquely mapped variants that have frequency info and appear in at least 1% of the population -- see our announcement about the 4 new SNP tracks here: http://genome.ucsc.edu/goldenPath/newsarch.html#041811.2 .)
However, the snp132 table only contains about 3.17 million SNPs listed as 'by-hapmap' in the valid field, which seems low. One of our engineers is looking into this further. Regarding the validation codes, we don't have a more elaborate explanation for you. We suggest contacting dbSNP directly at [email protected]. They might be able to point you to better documentation. -- Brooke Rhead UCSC Genome Bioinformatics Group On 04/22/11 17:23, Xiang Li wrote: > Hi > > > > I downloaded the the Table: snp132Common from Track: Common SNPs(132). > > > > A quick statistics is shown as below: > > > > #type of validation count > > by-1000genomes 4698898 > > by-2hit-2allele 1064881 > > by-cluster 3412491 > > by-frequency 2847474 > > by-hapmap 717332 > > by-submitter 138079 > > unknown 54311 > > > > I saw only 717332 from hapmap, while in Hapmap FTP site > (ftp://ftp.ncbi.nlm.nih.gov/hapmap), I saw over 4 million SNPs. > > Why is there such a huge difference? Thanks > > > > Also, where could I found a more detailed README regarding those > validation types, so that I can have a better idea of assess each type? > Currently, I can only assume Hapmap and 1000Genomes are more reliable > than the others. > > * Validation > <http://www.ncbi.nlm.nih.gov/SNP/snp_legend.cgi?legend=validation> : > Method used to validate the variant (each variant may be validated by > more than one method) > > * By Frequency - at least one submitted SNP in cluster has > frequency data submitted > * By Cluster - cluster has at least 2 submissions, with at > least one submission assayed with a non-computational method > * By Submitter - at least one submitter SNP in cluster was > validated by independent assay > * By 2 Hit/2 Allele - all alleles have been observed in at > least 2 chromosomes > * By HapMap - submitted by HapMap > <http://hapmap.ncbi.nlm.nih.gov/> project (human only) > * By 1000Genomes - submitted by 1000Genomes > <http://1000genomes.org/> project (human only) > * Unknown - no validation has been reported for this > variant > > Thanks > > > > Sean > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
