If your infact (sounds like) storing the pictures meta-data (name, size, owner, etc) and the data (blob of some kind) .. I would definately break up the design into 2 tables. That way when dealing with the meta-data table (your RAND() query) there is much less data that needs to be traversed to get your answer, which should result in a faster query.
On Mon, 8 Mar 2004, Neil Gunton wrote: > Hi all, > > I am using MySQL 4.0.x to run a community website which has (among other > things) over 19,000 pictures. There is a page that selects 30 random > thumbnails. I have noticed that the performance of "ORDER BY RAND()" on > this table has a significant impact on performace. I have all the > relevant indexes defined, and I have researched this issue on the Web. > It seems that other people have also encountered a performance hit while > using ORDER BY RAND(). The reason appears to be that when you do EXPLAIN > on a query using this, MySQL reports "Using temporary; Using filesort", > which is the worst possible result. Also, the number of rows reported is > pretty much the entire set. So, presumably, the current implementation > of ORDER BY RAND() means that MySQL has to traverse the entire table, > regardless of other indexes. > > There are, of course, other ways to get around this, but they are all > more complex than simply using ORDER BY RAND(). I think that selecting a > random number of records from a table is something that a lot of > websites would like to be able to do, and so as datasets get larger it > would be nice to see this function scale well. For anyone who has a > website with a large archive of data, the ability to present a random > selection of this data is very useful. > > I would like to know if anyone knows if the MySQL team is aware of this > problem, and if so whether they are planning on improving it at any > point. I ask mainly because if I am told that "yes, it'll be much better > in version X" then I can live with the couple of seconds that it takes > currently, knowing that this will be better down the line. However if I > am advised that this is a fundamentally hard problem for whatever > reason, then I will put the effort into reworking my tables to use an > alternative solution. > > The only real solution that I can see which is fast is to make another > table which contains just the unique IDs of the pictures that are > visible (there are others which are not publicly visible, and which > shouldn't be included in the random query, so making a separate table > with the appropriate subset makes sense for performance). This new table > will have a primary key which is a numeric "sequence" field. Every > record will have its own sequence number, going from 1 up to the number > of records. Then, instead of doing one query with ORDER BY RAND() LIMIT > 30, I can instead do 30 queries, each with a different random sequence > (generated from Perl), which will look up the unique sequence number. > Since this is a primary key, it will be very fast, so that doing 30 > queries will not have a big performance impact. However this scheme > requires that the sequences in the new table be kept very consistent - > for example, if a picture is removed from the sequence then the sequence > numbers above that record have to be updated. This introduces potential > for error, but it is a possible solution. I don't want to implement it, > obviously, if ORDER BY RAND() is slated for improvement. > > Thanks for any ideas or insights... > > -Neil Gunton > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]