Hi Martin, dbSNP suffers from some garbage-in, garbage-out issues, and we display what we get from dbSNP, so we do too. Ultimately, the strange values come from human beings that type in information about their SNP discoveries in submissions to dbSNP, apparently with too few constraints on input.
If you click through our details page to dbSNP's report page for your example SNPs (e.g. http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs3215906), and scroll down to the Population Diversity section, you can see that dbSNP reports whatever genotypes/alleles appear in submissions (e.g. +/+, +/-, -/-, -/T). You can click into submitted SNPs (IDs begin with ss) to see who submitted alleles like "0" or "+". If you would like to take this up with dbSNP, you can email them at [email protected]. It would be great if they had the resources to find and fix anomalies like this in their hundreds of millions of submitted SNPs. In the meantime, I will add a new flag to the exceptions column to report allele frequencies that are inconsistent with the observed alleles, for the next release of our SNPs track (currently waiting for build 135 ftp files to become available). Angie P.S. If you're interested in the details -- we extract alleles and frequencies directly from tab-separated database dump files downloaded from dbSNP: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/SNPAlleleFreq.bcp.gz ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/shared_data/Allele.bcp.gz The columns of SNPAlleleFreq.bcp.gz are snp_id, allele_id, chr_cnt, freq, last_updated_time. The columns of Allele.bcp.gz are allele_id, allele, create_time, rev_allele_id, src , last_updated_time. We load them into mysql, join them on allele_id, and extract the snp_id, allele, chr_cnt and freq columns for use in our snpNNN tables. Here are some mysql queries for your example rs IDs: <pre> mysql> select snp_id, Allele.allele_id, allele, chr_cnt, freq from SNPAlleleFreq,Allele where Allele.allele_id = SNPAlleleFreq.allele_id and snp_id = 2229621; +---------+-----------+--------+---------+------------+ | snp_id | allele_id | allele | chr_cnt | freq | +---------+-----------+--------+---------+------------+ | 2229621 | 4 | T | 423 | 0.581044 | | 2229621 | 6 | A | 299 | 0.410714 | | 2229621 | 396064 | 0 | 6 | 0.00824176 | +---------+-----------+--------+---------+------------+ mysql> select snp_id, Allele.allele_id, allele, chr_cnt, freq from SNPAlleleFreq,Allele where Allele.allele_id = SNPAlleleFreq.allele_id and snp_id = 3215906; +---------+-----------+--------+---------+------------+ | snp_id | allele_id | allele | chr_cnt | freq | +---------+-----------+--------+---------+------------+ | 3215906 | 4 | T | 1 | 0.00253807 | | 3215906 | 5 | - | 62 | 0.15736 | | 3215906 | 8 | + | 331 | 0.840102 | +---------+-----------+--------+---------+------------+ </pre> ----- Original Message ----- > Hello: > > I would like to use the allele frequency data in snp132Common (hg19). > But I don't understand some of the "alleles" in column 23 of this > table. > > For example, here is the table row for rs2229621: > > 1157 chr2 75099476 75099477 rs2229621 0 + > A A A/T genomic single > by-cluster,by-frequency,by-submitter,by-hapmap,by-1000genomes > 0.378067 0.214706 missense exact 1 15 > 1000GENOMES,APPLERA_GI,BGI,CANCER-GENOME,COMPLETE_GENOMICS,CSHL-HAPMAP,EGP_SNPS,HGSV,IBARROSO,ILLUMINA,IMCJ-GDT,PERLEGEN,SEATTLESEQ,WICVAR,YUSUKE, > 3 T,A,0, 423.000000,299.000000,6.000000, 0.581044,0.410714,0.008242, > maf-5-some-pop,maf-5-all-pops > > Column 23 is: T,A,0, > What does "0" mean? > (These "0"s are very common for chrX.) > > Another example, here is rs3215906: > > 764 chr1 23518470 23518471 rs3215906 0 - A A > -/T genomic > deletion by-cluster,by-frequency 0.270941 0.249121 unknown > exact1 4 > BUSHMAN,DEVINE_LAB,SNP500CANCER,YUSUKE, 3 T,-,+, > 1.000000,62.000000,331.000000, 0.002538,0.157360,0.840102, > maf-5-some-pop,maf-5-all-pops > > Column 23 is: T,-,+, > What does "+" mean? > > Have a nice day, > Martin Frith > http://www.cbrc.jp/~martin/ > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
