Hi Martin,

dbSNP suffers from some garbage-in, garbage-out issues, and we display what we 
get from dbSNP, so we do too.  Ultimately, the strange values come from human 
beings that type in information about their SNP discoveries in submissions to 
dbSNP, apparently with too few constraints on input.  

If you click through our details page to dbSNP's report page for your example 
SNPs (e.g. http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs3215906), 
and scroll down to the Population Diversity section, you can see that dbSNP 
reports whatever genotypes/alleles appear in submissions (e.g. +/+, +/-, -/-, 
-/T).  You can click into submitted SNPs (IDs begin with ss) to see who 
submitted alleles like "0" or "+".

If you would like to take this up with dbSNP, you can email them at 
[email protected].  It would be great if they had the resources to 
find and fix anomalies like this in their hundreds of millions of submitted 
SNPs.  In the meantime, I will add a new flag to the exceptions column to 
report allele frequencies that are inconsistent with the observed alleles, for 
the next release of our SNPs track (currently waiting for build 135 ftp files 
to become available).

Angie

P.S. If you're interested in the details -- we extract alleles and frequencies 
directly from tab-separated database dump files downloaded from dbSNP:

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/SNPAlleleFreq.bcp.gz
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/shared_data/Allele.bcp.gz

The columns of SNPAlleleFreq.bcp.gz are snp_id, allele_id, chr_cnt, freq, 
last_updated_time.  The columns of Allele.bcp.gz are allele_id, allele, 
create_time, rev_allele_id, src , last_updated_time.  We load them into mysql, 
join them on allele_id, and extract the snp_id, allele, chr_cnt and freq 
columns for use in our snpNNN tables.  Here are some mysql queries for your 
example rs IDs:

<pre>
mysql> select snp_id, Allele.allele_id, allele, chr_cnt, freq from 
SNPAlleleFreq,Allele where Allele.allele_id = SNPAlleleFreq.allele_id and 
snp_id = 2229621;
+---------+-----------+--------+---------+------------+
| snp_id  | allele_id | allele | chr_cnt | freq       |
+---------+-----------+--------+---------+------------+
| 2229621 |         4 | T      |     423 |   0.581044 | 
| 2229621 |         6 | A      |     299 |   0.410714 | 
| 2229621 |    396064 | 0      |       6 | 0.00824176 | 
+---------+-----------+--------+---------+------------+

mysql> select snp_id, Allele.allele_id, allele, chr_cnt, freq from 
SNPAlleleFreq,Allele where Allele.allele_id = SNPAlleleFreq.allele_id and 
snp_id = 3215906;
+---------+-----------+--------+---------+------------+
| snp_id  | allele_id | allele | chr_cnt | freq       |
+---------+-----------+--------+---------+------------+
| 3215906 |         4 | T      |       1 | 0.00253807 | 
| 3215906 |         5 | -      |      62 |    0.15736 | 
| 3215906 |         8 | +      |     331 |   0.840102 | 
+---------+-----------+--------+---------+------------+
</pre>


----- Original Message -----
> Hello:
> 
> I would like to use the allele frequency data in snp132Common (hg19).
> But I don't understand some of the "alleles" in column 23 of this
> table.
> 
> For example, here is the table row for rs2229621:
> 
> 1157 chr2     75099476        75099477        rs2229621       0       +       
> A       A       A/T     genomic single
>       by-cluster,by-frequency,by-submitter,by-hapmap,by-1000genomes
> 0.378067 0.214706 missense exact 1  15
> 1000GENOMES,APPLERA_GI,BGI,CANCER-GENOME,COMPLETE_GENOMICS,CSHL-HAPMAP,EGP_SNPS,HGSV,IBARROSO,ILLUMINA,IMCJ-GDT,PERLEGEN,SEATTLESEQ,WICVAR,YUSUKE,
> 3 T,A,0, 423.000000,299.000000,6.000000, 0.581044,0.410714,0.008242,
> maf-5-some-pop,maf-5-all-pops
> 
> Column 23 is: T,A,0,
> What does "0" mean?
> (These "0"s are very common for chrX.)
> 
> Another example, here is rs3215906:
> 
> 764   chr1    23518470  23518471    rs3215906   0     -       A       A       
> -/T     genomic
>       deletion        by-cluster,by-frequency 0.270941 0.249121 unknown 
> exact1  4
> BUSHMAN,DEVINE_LAB,SNP500CANCER,YUSUKE, 3 T,-,+,
> 1.000000,62.000000,331.000000, 0.002538,0.157360,0.840102,
> maf-5-some-pop,maf-5-all-pops
> 
> Column 23 is: T,-,+,
> What does "+" mean?
> 
> Have a nice day,
> Martin Frith
> http://www.cbrc.jp/~martin/
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to