Hi Xiang,

The snp132Common track is a subset of snp132, so it makes sense that
there are fewer HapMap SNPs there.  (snp132Common contains uniquely
mapped variants that have frequency info and appear in at least 1% of
the population -- see our announcement about the 4 new SNP tracks here:
http://genome.ucsc.edu/goldenPath/newsarch.html#041811.2 .)

However, the snp132 table only contains about 3.17 million SNPs listed
as 'by-hapmap' in the valid field, which seems low.  One of our
engineers is looking into this further.

Regarding the validation codes, we don't have a more elaborate
explanation for you.  We suggest contacting dbSNP directly at
[email protected].  They might be able to point you to better
documentation.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 04/22/11 17:23, Xiang Li wrote:
> Hi
> 
>  
> 
> I downloaded the the Table: snp132Common from Track: Common SNPs(132).  
> 
>  
> 
> A quick statistics is shown as below:
> 
>  
> 
> #type of validation      count
> 
> by-1000genomes  4698898
> 
> by-2hit-2allele 1064881
> 
> by-cluster      3412491
> 
> by-frequency    2847474
> 
> by-hapmap       717332
> 
> by-submitter    138079
> 
> unknown 54311
> 
>  
> 
> I saw only 717332 from hapmap, while in Hapmap FTP site
> (ftp://ftp.ncbi.nlm.nih.gov/hapmap), I saw over 4 million SNPs. 
> 
> Why is there such a huge difference?  Thanks
> 
>  
> 
> Also, where could I found a more detailed README regarding those
> validation types, so that I can have a better idea of assess each type?
> Currently, I can only assume Hapmap and 1000Genomes are more reliable
> than the others.
> 
> *     Validation
> <http://www.ncbi.nlm.nih.gov/SNP/snp_legend.cgi?legend=validation> :
> Method used to validate the variant (each variant may be validated by
> more than one method)
> 
>       *       By Frequency - at least one submitted SNP in cluster has
> frequency data submitted
>       *       By Cluster - cluster has at least 2 submissions, with at
> least one submission assayed with a non-computational method
>       *       By Submitter - at least one submitter SNP in cluster was
> validated by independent assay
>       *       By 2 Hit/2 Allele - all alleles have been observed in at
> least 2 chromosomes
>       *       By HapMap - submitted by HapMap
> <http://hapmap.ncbi.nlm.nih.gov/>  project (human only)
>       *       By 1000Genomes - submitted by 1000Genomes
> <http://1000genomes.org/>  project (human only)
>       *       Unknown - no validation has been reported for this
> variant
> 
> Thanks
> 
>  
> 
> Sean
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to