Slow queries and inconstistent results - expert needed

Hanno Fietz [privat] Mon, 13 Oct 2003 04:17:03 -0700

Hello,

I'm experimenting with a large table (currently 9.7 M rows / 500 MB disk
space) that stores measuring results and I sometimes (!) get slow
queries for no good reason. The table looks like this:


+------------------------+
|                 values |
+------------------------+
| int (11)      sourceid |
| timestamp (14)   dummy |
| timestamp (14)   start |
| int (11)      newstart |
| mediumint (9) duration |
| float            value |
| tinyint (4)  valuetype |
+------------------------+

CREATE TABLE values (
  sourceid int(11) NOT NULL default '0',
  dummy timestamp(14) NOT NULL,
  start timestamp(14) NOT NULL,
  newstart int(11) NOT NULL default '0',
  duration mediumint(9) NOT NULL default '0',
  value float NOT NULL default '0',
  valuetype tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (valuetype,sourceid,newstart),
  KEY sourceid (sourceid),
  KEY start (start)
) TYPE=MyISAM;

The fields 'dummy' and 'start' are deprecated and will be removed as
soon as all my scripts have been altered to match the new structure. I
think they're not relevant for my problem (but who knows?). The index I
use consists of three fields: valuetype (either 0 or 8 so far), sourceid
(approx. 500 different values) and newstart (containing UNIX-timestamps,
hardly any duplicates at all), in that order. 

I'm using temporary, disk-based MyISAM tables for calculations. The
first step of that process is to load the desired part of the data table
into the temporary table using a statement like:

INSERT INTO result_table
                ...
SELECT result_1,
       result_2,
       ...
       result_n
FROM values 
WHERE sourceid IN (1,2,3)
  AND (newstart+duration) > some_UNIX_timestamp
  AND newstart < some_other_UNIX_timestamp
  AND valuetype=8;

Despite several steps of optimization of the whole process (involving
three more queries and another temporary table), an average total
execution time of around 3-5 seconds is perfectly normal, as are peak
times of up to 30 seconds. For some reason though, I occasionally get
execution times of several minutes for the above query ALONE (the other
queries are much faster), without any obvious correlation to the number
of rows retrieved. To get a little more statistics, I set
long_query_time down to 30 seconds, which made all the interesting
queries available in the slow query log. Below are some of the logged
results for 'sourceid IN(429,430,431) ... AND valuetype=0':


+-------+--------------------------+---------------------------+--------
----+-----------------+---------------------+
| query | FROM_UNIXTIME(left_edge) | FROM_UNIXTIME(right_edge) | exec.
time | rows examinated |      execution date |
+-------+--------------------------+---------------------------+--------
----+-----------------+---------------------+
|   1   |      2003-10-01 11:51:07 |       2003-10-08 11:51:07 |  >214
sec. |       5'946'933 | 2003-10-08 11:55:43 |
|   2   |      2003-10-01 11:51:07 |       2003-10-08 11:51:07 |  >153
sec. |       4'898'654 | 2003-10-08 11:55:55 |
|   3   |      2003-07-07 10:01:28 |       2003-10-08 12:13:19 |    52
sec. |          97'062 | 2003-10-08 12:17:46 |
+-------+--------------------------+---------------------------+--------
----+-----------------+---------------------+

and some information on the table's content (valuetype=0 at 2003-10-08
16:10:17):

+----------+------------------------------+-----------------------------
-+-----------------+
| sourceid | FROM_UNIXTIME(MAX(newstart)) | FROM_UNIXTIME(MAX(newstart))
| COUNT(newstart) |
+----------+------------------------------+-----------------------------
-+-----------------+
|      429 |          2003-07-07 10:04:04 |          2003-10-08 16:05:18
|           30059 |
|      430 |          2003-07-07 10:02:55 |          2003-10-08 16:07:28
|           33155 |
|      431 |          2003-07-07 10:01:28 |          2003-10-08 16:07:34
|           28209 |
+----------+------------------------------+-----------------------------
-+-----------------+

Obviously, in the two identical queries above, (1,2) MySQL was
attempting a full table scan (and was stopped by a quick-and-dirty
safety script of mine, which wakes up every minute and kills every
process older than 180 seconds - I know this isn't good practice, but I
can't have massive queries blocking incoming updates), whereas in the
other one, only the rows with a correct sourceid were scanned (the
second table shows less than 97'062 records because rows with
valuetype<>0 weren't counted). As the second table suggests, query 3
retrieved all rows with valuetype=0, whereas 1 and 2 attempted to get
all values of the last 7 days. One might think this could be a hint,
but, as mentioned above, most of the time the queries are running
smoothly and querying for some particular portion of data in the middle
of the table usually works fine. Also, EXPLAIN shows very modest row
numbers for all queries. CHECK / REPAIR TABLE just give 'status : OK'
and are not of much use, though it seems to me that checking / repairing
the table helps for a while.

I was able to reproduce the phenomenon with the very same query from the
slow query log repeatedly. Then I tried to alter the WHERE-clause to
match the field order in the index and was glad to see the query go
through as fast as usual -- until I did the cross-check and voilá: the
original version of the query now magically went through just as fast.
Meanwhile, some other queries had found their way into the slow query
log...

I can't really imagine that this is a bug, but I have no idea what my
mistake is. Does the field order in the WHERE-clause in any way affect
the speed of the query? Can anybody explain the inconsistency in
reproducing the phenomenon?

Now, I'm aware that I have a rather large table and the calculations I'm
doing require some processing time. Additionally, there are between 10
and 100 INSERTs on the table once a minute, which I know doesn't make it
any better. But turning off the data feed did not solve the problem, and
most queries seem to have no problem. There's hardly anything else
running on the system and mysqld and innodb are granted about 50% of all
memory. 

I'm running a 4.0.15 binary distribution on a SUSE-Linux. My settings
are (InnoDb included just in case, I only use MyISAM in the above
example):

key_buffer=64M
max_allowed_packet=1M
table_cache=64
sort_buffer=512K
net_buffer_length=8K
myisam_sort_buffer_size=8M

innodb_mirrored_log_groups=1
innodb_log_files_in_group=3
innodb_log_file_size=10M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=0
innodb_log_archive=0
innodb_buffer_pool_size=30M
innodb_additional_mem_pool_size=5M
innodb_file_io_threads=4
innodb_lock_wait_timeout=50

Any suggestions how to monitor the queries? I haven't installed
--with-debug, unfortunately. Is there a way to add the debugging kit
now? Could it be of any use? I am in the unfortunate situation of trying
to improve the database architecture in a live production environment to
match our grown needs, while the current version is already close to its
limits and needs care on a daily basis. I could use some tips on that as
well...

Thanks for any help - 'my Latin has come to an end' as we Germans would
say.

Hanno

------------------------------------------------------------------------
------------
Hanno Fietz
 
dezem GmbH
Lohmeyerstr. 9
10587 Berlin
 
Tel.: 030 / 34 70 50 22
Fax: 030 / 34 70 50 21
www.dezem.de
------------------------------------------------------------------------
------------



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Slow queries and inconstistent results - expert needed

Reply via email to