I think you are right about O log N performance when finding matching records within the index - however, the index doesn't contain all the data, so the mysql server has to hit the table as stored on disk to find what you were actually asking for (select *). That becomes time consuming as it sifts through all the data from disk to retrieve matching records starting and ending at the right points. Note in your EXPLAIN output that MySQL thinks it will have to examine about 506222 rows to find the matching one(s); that's a pretty large number of rows to examine (half your table).

That's where having a wider distribution of user_id values would help, I think. Of course, you haven't got that wider distribution, so that's a pointless discussion (I suggested looking for other user_ids earlier merely to confirm whether such queries were faster).

Not that this solves your problem, but try
SELECT user_id, fullname
FROM contacts WHERE user_id=1 ORDER BY fullname
LIMIT 1 OFFSET 500000;
as this should not hit the table on disk, only the index, since the user_id_2 index IS the data for this query. I'm more familiar with MyISAM than InnoDB but I think this holds true for both table types.

Dan



Adam Wolff wrote:
Thanks for the response, Dan. I did try ORDER BY on the table. Didn't help -- I presume because the query is using an index.

Unfortunately, the point of my current development is to show searches against millions of contacts, so the suggestion about working with other user_ids isn't too practical.

I will look into increasing the size of my data cache.

I guess what surprises me is that I thought that the index was stored as a BTree in sort order. I'm pretty bad with big-O, but I thought this would suggest O log N performance to find a given offset within the index.

A

On May 10, Dan Buettner wrote:

I would expect the problem to be that the further down in the data you go by
using OFFSET, the more records the mysql server has to scan in order to get to
what you want.  This will produce a fairly linear slowdown the further in you
go - it just takes time to check through 1,000,000 matching records.
Especially on desktop grade hardware where you probably haven't got the
fastest disk subsystem.

I think in this case your slow searches may be a result of the heavy bias in
your data toward user_id 1.  Try your search on some of your other user_ids
and see.  With so many records for the same user_id, your search for that
user_id is necessitating something pretty close to a table scan, even though
it's hitting an index.

Some suggestions would be to increase the size of your data cache, so that
after your first queries, the data (or more of it) is in memory. Assuming
you'll be deploying on server hardware, a faster disk system should help quite
a bit too.  Memory caches on hardware RAID systems can help with this kind of
thing too.

You might also try
ALTER TABLE contacts ORDER BY user_id, fullname
to get your data sorted into the same order you're looking through it, though
it may well affect other queries you need to run against the same data.  I'm
not certain whether you can ORDER BY more than one column:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
Also note that as you add or delete rows the table does not stay in order.

Hope this helps!

Dan


Adam Wolff wrote:
I have a very simple table that looks like this:
CREATE TABLE `contacts` (
`id` int(11) NOT NULL auto_increment,
`fullname` varchar(100) default NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY  (`id`),
KEY `user_id` (`user_id`),
KEY `user_id_2` (`user_id`,`fullname`),
CONSTRAINT `contacts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user`
(`id`)
ENGINE=InnoDB DEFAULT CHARSET=utf8


It's a bit of a lopsided table in that of the 1,000,100 records in the db,
1,000,000 of them belong to user_id 1. But I wouldn't expect this to
skew my results.

I am writing a little paging server that retrieves pages of data using
LIMIT and OFFSET.

I'm really surprised by how slowly my queries are running on a
relatively fast desktop machine. Records near the top of the list are
fine:
mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
      LIMIT 1 OFFSET 0;
+--------+--------------+-----------------------------+---------+----------+ | id | fullname | email | user_id | nickname | +--------+--------------+-----------------------------+---------+----------+ | 371543 | Aaron Abbott | [EMAIL PROTECTED] | 1 | aaronab | +--------+--------------+-----------------------------+---------+----------+ 1 row in set (0.03 sec)

But as I move down the list, queries run slower and slower:
mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
      LIMIT 1 OFFSET 100000;
+--------+--------------+-----------------------------+---------+----------+ | id | fullname | email | user_id | nickname | +--------+--------------+-----------------------------+---------+----------+ | 726543 | Benny Abbott | [EMAIL PROTECTED] | 1 | bennyab | +--------+--------------+-----------------------------+---------+----------+ 1 row in set (2.94 sec)

mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
      LIMIT 1 OFFSET 500000;
+--------+---------------+------------------------------+---------+----------+ | id | fullname | email | user_id | nickname
|
+--------+---------------+------------------------------+---------+----------+ | 309543 | Jimmie Abbott | [EMAIL PROTECTED] | 1 | jimmieab
|
+--------+---------------+------------------------------+---------+----------+ 1 row in set (12.75 sec)

EXPLAIN says:
+----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key |
key_len | ref   | rows   | Extra       |
+----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+ | 1 | SIMPLE | contacts | ref | user_id,user_id_2 | user_id_2 |
4       | const | 506222 | Using where |
+----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+
In other words, it *is* using an index for this query. Anyone have any
advice for me?

Thanks,
Adam




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to