Hey Angie,

Thank you again for your help.  I see, so the difference is due to the 
different version of the dbSNP databases, instead of any customized handling at 
the genome browser end. 

Best

Sean


-----Original Message-----
From: Angie Hinrichs [mailto:[email protected]] 
Sent: Monday, August 08, 2011 9:42 AM
To: Xiang Li
Cc: [email protected]
Subject: Re: [Genome] allele_copy_num inconsistent with NCBI dbSNP record 
forrs538

Hi Sean,

The short answer is that we show the allele frequencies that were in dbSNP's 
b132 release download files, and those files don't contain as many counts as 
shown on the web page now.  I suspect that dbSNP updates data on their web 
pages more frequently than the release schedule, but of course it's best to ask 
them about that.  

Our process involves ftp'ing dbSNP's database dump files after they announce a 
release (ftp://ftp.ncbi.nlm.nih.gov/snp/database/organism_data/human_9606/), 
and processing those into our local format.  Allele frequencies and counts come 
directly from dbSNP's SNPAlleleFreq table.  For snp132, we downloaded the table 
in Nov. 2010; at that time, there were almost no allele counts for 1000 
Genomes, so I asked dbSNP about it and they realized that their frequency 
tables had not been updated in some time.  At the end of Dec. 2010, they 
regenerated a few tables including SNPAlleleFreq.  I downloaded SNPAlleleFreq 
again in early Jan. 2011, and built our snp132 track, so our allele frequencies 
are from Dec. 2010.  This is what dbSNP's Dec. 2010 SNPAlleleFreq table has for 
rs538 and rs222:

mysql> select * from SNPAlleleFreq where snp_id = 538;
+--------+-----------+---------+-----------+---------------------+
| snp_id | allele_id | chr_cnt | freq      | last_updated_time   |
+--------+-----------+---------+-----------+---------------------+
|    538 |         2 |       1 | 0.0526316 | 2010-12-26 18:59:47 | 
|    538 |         4 |       4 |  0.210526 | 2010-12-26 18:59:47 | 
|    538 |         7 |      14 |  0.736842 | 2010-12-26 18:59:47 | 
+--------+-----------+---------+-----------+---------------------+

mysql> select * from SNPAlleleFreq where snp_id = 222;
+--------+-----------+---------+----------+---------------------+
| snp_id | allele_id | chr_cnt | freq     | last_updated_time   |
+--------+-----------+---------+----------+---------------------+
|    222 |         4 |    1823 | 0.714342 | 2010-12-26 18:59:47 | 
|    222 |         7 |     729 | 0.285658 | 2010-12-26 18:59:47 | 
+--------+-----------+---------+----------+---------------------+

We are eagerly anticipating dbSNP's next release of human SNPs, which hopefully 
will happen this fall (and perhaps even sooner).  dbSNP's summary page 
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi indicates that since 
132 there have been 46M new submissions for human, 24M of which include 
genotypes and/or allele frequencies.  

Hope that helps, and please email us at [email protected] if you have any 
more questions,
Angie


----- Original Message -----
From: "Xiang Li" <[email protected]>
To: [email protected]
Sent: Sunday, August 7, 2011 9:50:30 AM
Subject: Re: [Genome] allele_copy_num inconsistent with NCBI dbSNP record       
forrs538

The same case for rs222. There are 3200 allele samples based on
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=222,

But there are only 2552 samples based on genome browser (T,C,
1823.000000,729.000000)

Did NCBI update their dbSNP database just recently? 

Sean

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Xiang Li
Sent: Sunday, August 07, 2011 8:15 AM
To: [email protected]
Subject: [Genome] allele_copy_num inconsistent with NCBI dbSNP record
forrs538

Hi, Dear support,

 

Could you please help me understand why the allele_copy_num is different
from NCBI dbSNP record for rs538. 

 

Based on http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=538,
there are hundreds of samples in HapMap pilot studies.  

 

However, UCSC genome browser shows there are only 19 allele copy numbers
there:

------------------------------------------------------------------------
-----------------

alleles:            G,T,C         allele_copy_num:
1.000000,4.000000,14.000000

------------------------------------------------------------------------
-----------------

 

If I don't include the samples in HapMap pilot studies, the number would
match exactly.  This is just one of many examples.   Can you please help
me understand what rules were used by you to derive those numbers, such
as why not include HapMap pilot studies?

 

Thanks.

 

Best

 

Sean

 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to