Hi, John! Queries that take more than a few minutes to run are probably inappropriate for the shared public mysql server.
I found this query formulation for you that takes less than one minute: select name, observed, count(*) from( (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') union (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') union (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') union (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16') ) resultAlias group by name, observed having count(*) = 4 limit 30; +-----------+----------+----------+ | name | observed | count(*) | +-----------+----------+----------+ | rs1000014 | A/G | 4 | | rs1000047 | C/T | 4 | | rs1000077 | C/G | 4 | | rs1000078 | A/G | 4 | | rs1000100 | A/T | 4 | | rs1000174 | A/G | 4 | | rs1000178 | C/T | 4 | | rs1000192 | A/G | 4 | | rs1000193 | A/C | 4 | | rs1000454 | C/G | 4 | | rs1000455 | A/T | 4 | | rs1000710 | G/T | 4 | | rs1000711 | C/G | 4 | | rs1000720 | A/G | 4 | | rs1000742 | C/T | 4 | | rs1001157 | A/G | 4 | | rs1001170 | G/T | 4 | | rs1001171 | A/T | 4 | | rs1001302 | A/G | 4 | | rs1001362 | C/T | 4 | | rs1001366 | C/T | 4 | | rs1001493 | C/T | 4 | | rs1001554 | A/G | 4 | | rs1001608 | C/T | 4 | | rs1001631 | C/G | 4 | | rs1001655 | A/G | 4 | | rs1001722 | G/T | 4 | | rs1001776 | C/T | 4 | | rs1001861 | A/G | 4 | | rs1001871 | C/G | 4 | +-----------+----------+----------+ 30 rows in set (46.17 sec) Of course for your own full output, you would remove the "limit" clause. In case you are curious how many there are: select count(*) from ( select name, observed, count(*) from( (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') union (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') union (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') union (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16') ) resultAlias group by name, observed having count(*) = 4) resultAlias2; +----------+ | 105841 | +----------+ 1 row in set (47.60 sec) Another alternative would be to capture the output from each like this: select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16' for each of your 4 files. You could sort them by name (rsId) either with an order by clause in sql, or with the unix sort command. You can even use the unix join command to join them up on the name and observed fields. Once the contents of each of the 4 sets are sorted by name and observed, joining them can be very fast. -Galt 4/1/2011 8:25 AM, John Hayward: > I would like to run queries against the genome-mysql.cse.ucsc.edu database > which may be excessive and don't want to cause problems for others. > > I want to find matches for a particular chromosome which have the same name > and observation for tables hapmapSnpsCEU, haphapmapSnpsYRI, mapSnpsCHB, > hapmapSnpsJPT. > > Doing a query to pickup the count of hapmapSnpsCEU for one chromosome took > 0.14 seconds. > If I do a query to pick up the count joining hapmapSnpsCEU and hapmapSnpCHB > took 8.40 seconds. > > If I join all tables would that constitute an excessive load? > > Below is the query joining two tables. > ====== > select count(*) from hapmapSnpsCEU, hapmapSnpsCHB where hapmapSnpsCEU.chrom = > 'Chr16' and hapmapSnpsCHB.chrom = 'Chr16' and hapmapSnpsCEU.name = > hapmapSnpsCHB.name and hapmapSnpsCEU.observed = hapmapSnpsCHB.observed; > ====== > johnh... > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
