Re: idle query

Dan Nelson Mon, 26 Jul 2010 20:31:47 -0700

In the last episode (Jul 26), Mike Spreitzer said:
> A colleague is running MySQL community server 5.1.34 on RHEL 5 on a big
> Xeon-based SMP (16 CPUs, 64 GB memory).  It is taking a surprisingly long
> time to execute a query, yet is not working particularly hard at it.  I
> wonder why this might be.  Following are details.  First, some `vmstat`
> output that shows the machine is doing almost nothing (I have inserted
> some additional spaces in the header to make it line up better); it shows
> no CPU activity and very little I/O:
> 
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- --system-- 
> -----cpu------
>  r  b   swpd     free   buff   cache   si   so    bi    bo   in   cs us sy   
> id wa st
>  0  0      0 56954556 328608 1040188    0    0  2600   130    1    1  1  0  
> 99  0  0
>  0  1      0 56954564 328608 1040188    0    0    74     6  301  961  0  0  
> 100  0  0
>  0  0      0 56954564 328608 1040188    0    0    73     1  287  970  0  0  
> 100  0  0
>  0  0      0 56954564 328608 1040188    0    0    73     5  297  925  0  0  
> 100  0  0
>  0  0      0 56954564 328608 1040188    0    0    71    51  291  926  0  0  
> 100  0  0


iostat -x output would be helpful here, too, so we can see whether your
disks are at 100% busy.

You're doing ~75 I/O's per second, so it looks like you're running on slow
SATA drives (a 4500 RPM drive will tops out at 75 iops); with drives like
that, anything that forces random I/O to a large table is going to kill you. 
MySQL runs a single thread for each submitted query (i.e.  no Parallel Query
option like Oracle has), so if a query needs to do 14 million random reads,
that'll be 1400000/70/3600 ~= 6 hours just to fetch table data (let alone
the index lookups required to find the table rows).

MySQL also only has a simple index join type on disk-based tables, so no
hash joins (which would load your 2nd table into RAM temporarily as a hash
table).  It does use hash indexes on memory tables though, so if you load
fldrcv into a memory table and join fldsnd against it, that might work
pretty well.

Another option would be to create an index on fldrcv that covers both your
WHERE clause fields and your SELECT clause fields.  MySQL will be able to
use that index to satisfy the query's requirements without going to the
table itself.  Then you can use the LOAD INDEX INTO CACHE command to preload
that index into memory.  This ends up working similar to the memory table
option (no disk I/O needed for the 2nd table), with tradeoffs on both sides
(a preloaded index consumes disk space and mysql has to keep it updated when
the table is updated, but a memory table has to be dropped and rebuilt every
time you run your main query if the fldrcv table changes often).  Which
option you choose depends on how often you run the query, and how often
fldrcv changes.
 
> create table fldpar (p VARCHAR(200) NOT NULL,
>  rcd DATETIME NOT NULL, rcms SMALLINT NOT NULL, msgid BIGINT NOT  NULL, q
>  VARCHAR(200) NOT NULL, scd DATETIME NOT NULL, scms SMALLINT NOT NULL,
>  INDEX p(p), INDEX q(q) ) as select fldrcv.p, fldrcv.cd as rcd, fldrcv.cms
>  as rcms, fldrcv.msgid as msgid, fldrcv.q, fldsnd.cd as scd, fldsnd.cms as
>  scms from fldrcv, fldsnd where fldrcv.q=fldsnd.p AND
>  fldrcv.qboot=fldsnd.pboot AND fldrcv.msgid=fldsnd.msgid;

> This statement makes a new table by joining two existing tables.  Here is 
> what `explain` has to say about the select part of the statement:
> 
> mysql> explain select fldrcv.p, fldrcv.cd as rcd, fldrcv.cms as rcms,
>     ->  fldrcv.msgid as msgid, fldrcv.q, fldsnd.cd as scd,
>     ->  fldsnd.cms as scms from fldrcv, fldsnd where
>     ->  fldrcv.q=fldsnd.p AND fldrcv.qboot=fldsnd.pboot AND
>     ->  fldrcv.msgid=fldsnd.msgid;
> +----+-------------+--------+------+---------------+------+---------+------------------------------------------------------------------------------------------+-------+-------------+
> | id | select_type | table  | type | possible_keys | key  | key_len | ref     
>        | rows  | Extra       |
> +----+-------------+--------+------+---------------+------+---------+------------------------------------------------------------------------------------------+-------+-------------+
> |  1 | SIMPLE      | fldsnd | ALL  | pec,pbm       | NULL | NULL    | NULL    
>         | 29036 |             | 
> |  1 | SIMPLE      | fldrcv | ref  | qbm           | qbm  | 220     |  
> bigCell2906_flood.fldsnd.p,bigCell2906_flood.fldsnd.pboot,bigCell2906_flood.fldsnd.msgid
>   |   452 | Using where | 
> +----+-------------+--------+------+---------------+------+---------+------------------------------------------------------------------------------------------+-------+-------------+
> 
> The fldrcv table has an index on precisely the fields used in this join. 
> There are about 14 million rows in that table, which is about 480 times as
> many rows as there are in the fldsnd table.  I expect the result to be no
> larger than the fldrcv table.  So it looks like the index is making this
> query run about as fast as can be expected, right?  It did not take
> anywhere near 9 hours to make the fldrcv table ...  so why is it taking so
> long to do this join to make the fldpar table?

-- 
        Dan Nelson
        [email protected]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]

Re: idle query

Reply via email to