Hi, Brooke,

Thank you very much for the information!  

According to the data I downloaded from
ftp://ftp.ncbi.nlm.nih.gov/hapmap, I saw 4,163,790 RefSNPs investigated
in HapMap Phase II and III.  However, in snp132common's statistics,
there are only 2M.  So about 2M Hapmap SNPs are filtered out by the
criteria of "unique mapping" and "1% frequency", which seems a bit too
many.  My calculation shows about 80% of the 4M hapmap SNPs have
frequency >1%.  I didn't check unique mappings though. I'm glad you also
spot this discrepancy. 

Best

Xiang




-----Original Message-----
From: Brooke Rhead [mailto:[email protected]] 
Sent: Monday, April 25, 2011 4:06 PM
To: Xiang Li
Cc: [email protected]
Subject: Re: [Genome] Too few HapMap SNPs

Hi Xiang,

The snp132Common track is a subset of snp132, so it makes sense that
there are fewer HapMap SNPs there.  (snp132Common contains uniquely
mapped variants that have frequency info and appear in at least 1% of
the population -- see our announcement about the 4 new SNP tracks here:
http://genome.ucsc.edu/goldenPath/newsarch.html#041811.2 .)

However, the snp132 table only contains about 3.17 million SNPs listed
as 'by-hapmap' in the valid field, which seems low.  One of our
engineers is looking into this further.

Regarding the validation codes, we don't have a more elaborate
explanation for you.  We suggest contacting dbSNP directly at
[email protected].  They might be able to point you to better
documentation.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 04/22/11 17:23, Xiang Li wrote:
> Hi
> 
>  
> 
> I downloaded the the Table: snp132Common from Track: Common SNPs(132).

> 
>  
> 
> A quick statistics is shown as below:
> 
>  
> 
> #type of validation      count
> 
> by-1000genomes  4698898
> 
> by-2hit-2allele 1064881
> 
> by-cluster      3412491
> 
> by-frequency    2847474
> 
> by-hapmap       717332
> 
> by-submitter    138079
> 
> unknown 54311
> 
>  
> 
> I saw only 717332 from hapmap, while in Hapmap FTP site
> (ftp://ftp.ncbi.nlm.nih.gov/hapmap), I saw over 4 million SNPs. 
> 
> Why is there such a huge difference?  Thanks
> 
>  
> 
> Also, where could I found a more detailed README regarding those
> validation types, so that I can have a better idea of assess each
type?
> Currently, I can only assume Hapmap and 1000Genomes are more reliable
> than the others.
> 
> *     Validation
> <http://www.ncbi.nlm.nih.gov/SNP/snp_legend.cgi?legend=validation> :
> Method used to validate the variant (each variant may be validated by
> more than one method)
> 
>       *       By Frequency - at least one submitted SNP in cluster has
> frequency data submitted
>       *       By Cluster - cluster has at least 2 submissions, with at
> least one submission assayed with a non-computational method
>       *       By Submitter - at least one submitter SNP in cluster was
> validated by independent assay
>       *       By 2 Hit/2 Allele - all alleles have been observed in at
> least 2 chromosomes
>       *       By HapMap - submitted by HapMap
> <http://hapmap.ncbi.nlm.nih.gov/>  project (human only)
>       *       By 1000Genomes - submitted by 1000Genomes
> <http://1000genomes.org/>  project (human only)
>       *       Unknown - no validation has been reported for this
> variant
> 
> Thanks
> 
>  
> 
> Sean
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome



_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to