> From: mos <mo...@fastmail.fm>
> 
> 
> At 10:34 AM 10/24/2011, you wrote:
>> select id from table order by rand() limit 1;
>> is doing as example a dumb temporary table with the full size
> 
> Because it has to sort the entire table, then it returns the one row. This of 
> course is extremely inefficient. :)

That is absolutely incredible and counter-intuitive, and (as you say) extremely 
inefficient!

This is used everywhere. Perhaps it is one of the biggest "anti-patterns" in 
SQL. I just checked two different SQL "cookbook" sites, and they both recommend 
ORDER BY RAND().

I just googled around a bit, and found that putting RAND() in the WHERE clause 
is very efficient:

SELECT id FROM table WHERE RAND() < 0.01 LIMIT 1

The comparison constant can be optimized for the number of rows you have. The 
above returns the first record of 1% of the table. If you have a million rows, 
you might want to bump that to something like 100 parts per million or so.

But really, folks, this is something so ubiquitous and so recommended, why 
can't the query optimizer look out for ORDER BY RAND() and simply skip the 
table sort and just grab some record? (Hopefully using something better than 
Knuth's LCRNG...)

----------------
Learning to think wholistically requires an overriding, or reversal, of much of 
the cultural heritage of the last few hundred years. -- David Holmgren
:::: Jan Steinman, EcoReality Co-op ::::


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to