If your infact (sounds like) storing the pictures meta-data (name, size,
owner, etc) and the data (blob of some kind) .. I would definately break
up the design into 2 tables.  That way when dealing with the meta-data
table (your RAND() query) there is much less data that needs to be
traversed to get your answer, which should result in a faster query.




On Mon, 8 Mar 2004, Neil Gunton wrote:

> Hi all,
>
> I am using MySQL 4.0.x to run a community website which has (among other
> things) over 19,000 pictures. There is a page that selects 30 random
> thumbnails. I have noticed that the performance of "ORDER BY RAND()" on
> this table has a significant impact on performace. I have all the
> relevant indexes defined, and I have researched this issue on the Web.
> It seems that other people have also encountered a performance hit while
> using ORDER BY RAND(). The reason appears to be that when you do EXPLAIN
> on a query using this, MySQL reports "Using temporary; Using filesort",
> which is the worst possible result. Also, the number of rows reported is
> pretty much the entire set. So, presumably, the current implementation
> of ORDER BY RAND() means that MySQL has to traverse the entire table,
> regardless of other indexes.
>
> There are, of course, other ways to get around this, but they are all
> more complex than simply using ORDER BY RAND(). I think that selecting a
> random number of records from a table is something that a lot of
> websites would like to be able to do, and so as datasets get larger it
> would be nice to see this function scale well. For anyone who has a
> website with a large archive of data, the ability to present a random
> selection of this data is very useful.
>
> I would like to know if anyone knows if the MySQL team is aware of this
> problem, and if so whether they are planning on improving it at any
> point. I ask mainly because if I am told that "yes, it'll be much better
> in version X" then I can live with the couple of seconds that it takes
> currently, knowing that this will be better down the line. However if I
> am advised that this is a fundamentally hard problem for whatever
> reason, then I will put the effort into reworking my tables to use an
> alternative solution.
>
> The only real solution that I can see which is fast is to make another
> table which contains just the unique IDs of the pictures that are
> visible (there are others which are not publicly visible, and which
> shouldn't be included in the random query, so making a separate table
> with the appropriate subset makes sense for performance). This new table
> will have a primary key which is a numeric "sequence" field. Every
> record will have its own sequence number, going from 1 up to the number
> of records. Then, instead of doing one query with ORDER BY RAND() LIMIT
> 30, I can instead do 30 queries, each with a different random sequence
> (generated from Perl), which will look up the unique sequence number.
> Since this is a primary key, it will be very fast, so that doing 30
> queries will not have a big performance impact. However this scheme
> requires that the sequences in the new table be kept very consistent -
> for example, if a picture is removed from the sequence then the sequence
> numbers above that record have to be updated. This introduces potential
> for error, but it is a possible solution. I don't want to implement it,
> obviously, if ORDER BY RAND() is slated for improvement.
>
> Thanks for any ideas or insights...
>
> -Neil Gunton
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to