Re: [Genome] when is a query excessive.

Galt Barber Fri, 01 Apr 2011 10:27:40 -0700

Hi, John!

Queries that take more than a few minutes to run are
probably inappropriate for the shared public mysql server.


I found this query formulation for you that takes less than one minute:

select name, observed, count(*) from(
(select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') 
union
(select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') 
union
(select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') 
union
(select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
) resultAlias group by name, observed having count(*) = 4 limit 30;

+-----------+----------+----------+
| name      | observed | count(*) |
+-----------+----------+----------+
| rs1000014 | A/G      |        4 |
| rs1000047 | C/T      |        4 |
| rs1000077 | C/G      |        4 |
| rs1000078 | A/G      |        4 |
| rs1000100 | A/T      |        4 |
| rs1000174 | A/G      |        4 |
| rs1000178 | C/T      |        4 |
| rs1000192 | A/G      |        4 |
| rs1000193 | A/C      |        4 |
| rs1000454 | C/G      |        4 |
| rs1000455 | A/T      |        4 |
| rs1000710 | G/T      |        4 |
| rs1000711 | C/G      |        4 |
| rs1000720 | A/G      |        4 |
| rs1000742 | C/T      |        4 |
| rs1001157 | A/G      |        4 |
| rs1001170 | G/T      |        4 |
| rs1001171 | A/T      |        4 |
| rs1001302 | A/G      |        4 |
| rs1001362 | C/T      |        4 |
| rs1001366 | C/T      |        4 |
| rs1001493 | C/T      |        4 |
| rs1001554 | A/G      |        4 |
| rs1001608 | C/T      |        4 |
| rs1001631 | C/G      |        4 |
| rs1001655 | A/G      |        4 |
| rs1001722 | G/T      |        4 |
| rs1001776 | C/T      |        4 |
| rs1001861 | A/G      |        4 |
| rs1001871 | C/G      |        4 |
+-----------+----------+----------+
30 rows in set (46.17 sec)

Of course for your own full output,
you would remove the "limit" clause.

In case you are curious how many there are:

select count(*) from (
select name, observed, count(*) from(
(select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') 
union
(select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') 
union
(select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') 
union
(select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
) resultAlias group by name, observed having count(*) = 4) resultAlias2;
+----------+
|   105841 |
+----------+
1 row in set (47.60 sec)


Another alternative would be to capture the output from each like this:

select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16'

for each of your 4 files.
You could sort them by name (rsId) either with an order by clause in 
sql, or with the unix sort command.

You can even use the unix join command to join them up on the name and 
observed fields.

Once the contents of each of the 4 sets are sorted by name and observed,
joining them can be very fast.

-Galt

4/1/2011 8:25 AM, John Hayward:
>   I would like to run queries against the genome-mysql.cse.ucsc.edu database 
> which may be excessive and don't want to cause problems for others.
>
> I want to find matches for a particular chromosome which have the same name 
> and observation for tables  hapmapSnpsCEU, haphapmapSnpsYRI, mapSnpsCHB, 
> hapmapSnpsJPT.
>
> Doing a query to pickup the count of hapmapSnpsCEU for one chromosome took 
> 0.14 seconds.
> If I do a query to pick up the count joining hapmapSnpsCEU and hapmapSnpCHB 
> took 8.40 seconds.
>
> If I join all tables would that constitute an excessive load?
>
> Below is the query joining two tables.
> ======
> select count(*) from hapmapSnpsCEU, hapmapSnpsCHB where hapmapSnpsCEU.chrom = 
> 'Chr16' and hapmapSnpsCHB.chrom = 'Chr16' and hapmapSnpsCEU.name = 
> hapmapSnpsCHB.name and hapmapSnpsCEU.observed = hapmapSnpsCHB.observed;
> ======
> johnh...
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] when is a query excessive.

Reply via email to