Hi Craig, One of our engineers suggests the following: There isn't a unique key because when a SNP is implicated by more than one study there are multiple rows for it, containing the metadata from each study.
If you need a distinct list of rs IDs, here is how to get that using mysql: mysql [[insert_public_mysql_stuff_here]] hg19 -NBe 'select distinct(name) from gwasCatalog'> gwasRsIds.txt or using a downloaded file: wgethttp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gwasCatalog.txt.gz zcat gwasCatalog.txt.gz | cut -f 5 | sort -u> gwasRsIds.txt If you want to keep the study info, a unique (except for multiply-mapped SNPs, but those are excluded from snp132Common) key could be constructed from the name and pubMedID columns. How to proceed depends on what tool(s) you are using (mysql? command-line? Table Browser/Galaxy? etc) and how much info you want to keep from gwasCatalog. Please contact us again at [email protected] if you have any further questions. --- Luvina Guruvadoo UCSC Genome Bioinformatics Group On 11/1/2011 6:16 AM, Benson, Craig C wrote: > Hi, > > I was wondering, for the table "gwasCatalog" in the hg19 database, is there a > unique key field(s) for each entry. There are 7,096 rows, but some SNPs have > more than one entry in the database, based on the disease association. I'm > trying to join the tables "snp132Common" and "gwasCatalog" for only a subset > of SNPs. > > Thanks > _______________________________________________ > Genome maillist [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
