RE: ORDER BY RAND() performance

Donny Simonton Mon, 08 Mar 2004 12:48:53 -0800

Exactly, it's faster for us to pull all of the data and then randomize it
locally.  We have benchmarked it both ways and the local randomize was
faster.  Now that's if you want multiple records returned.


Now if you only want one record, what we do, is create a random number, and
then just do a limit 19345, 1 or something like that.

We have tried another option which we stopped using which was creating 30
random numbers and then doing 30 select statements.  But that was slower
overall than 1 select with order by rand.  One other option that we use
sometimes is say you need 30 results randomized, and you have an
auto-increment in your table.  Create 30 random numbers, then do a select
with something like this:

Select * from blabla where lkajsdlkjas IN (10, 43, 22, 8981, etc...)

This works fairly well, but then again, I haven't benchmarked it in a while
and don't really remember how well it works.  Actually, I just tried this on
a table with 43 million entries and it took 0.0004 seconds.

Just some ideas.

Donny

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 08, 2004 2:36 PM
> To: Donny Simonton
> Cc: 'Neil Gunton'; 'MySQL'
> Subject: RE: ORDER BY RAND() performance
>
>
> Donny,  what do you do?  Throw all the values into an array or something
> on the client side, and use a random number generator to pull out the
> array elements?
>
> I suppose (depending on resultset size) pulling that many rows from server
> to client and handing on client side could be faster...
>
>
>
>
> On Mon, 8 Mar 2004, Donny Simonton wrote:
>
> > ORDER BY RAND(), just sucks in my opinion.  We have created our own
> internal
> > randomization system because pretty much everytime you use it will show
> up
> > in the slow query log, because of the using temporary, using filesort it
> > does.  Splitting your data into a hundred tables will still make it
> "using
> > temporary, using filesort".
> >
> > I just did a little test, where I only had 5 entries in a table, and I
> using
> > temp using filesort.
> >
> > Will it ever be improved?  Probably the same time order by DESC is
> improved.
> >
> > Donny
> >
> >
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, March 08, 2004 2:05 PM
> > > To: Neil Gunton
> > > Cc: MySQL
> > > Subject: Re: ORDER BY RAND() performance
> > >
> > >
> > > If your infact (sounds like) storing the pictures meta-data (name,
> size,
> > > owner, etc) and the data (blob of some kind) .. I would definately
> break
> > > up the design into 2 tables.  That way when dealing with the meta-data
> > > table (your RAND() query) there is much less data that needs to be
> > > traversed to get your answer, which should result in a faster query.
> > >
> > >
> > >
> > >
> > > On Mon, 8 Mar 2004, Neil Gunton wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am using MySQL 4.0.x to run a community website which has (among
> other
> > > > things) over 19,000 pictures. There is a page that selects 30 random
> > > > thumbnails. I have noticed that the performance of "ORDER BY RAND()"
> on
> > > > this table has a significant impact on performace. I have all the
> > > > relevant indexes defined, and I have researched this issue on the
> Web.
> > > > It seems that other people have also encountered a performance hit
> while
> > > > using ORDER BY RAND(). The reason appears to be that when you do
> EXPLAIN
> > > > on a query using this, MySQL reports "Using temporary; Using
> filesort",
> > > > which is the worst possible result. Also, the number of rows
> reported is
> > > > pretty much the entire set. So, presumably, the current
> implementation
> > > > of ORDER BY RAND() means that MySQL has to traverse the entire
> table,
> > > > regardless of other indexes.
> > > >
> > > > There are, of course, other ways to get around this, but they are
> all
> > > > more complex than simply using ORDER BY RAND(). I think that
> selecting a
> > > > random number of records from a table is something that a lot of
> > > > websites would like to be able to do, and so as datasets get larger
> it
> > > > would be nice to see this function scale well. For anyone who has a
> > > > website with a large archive of data, the ability to present a
> random
> > > > selection of this data is very useful.
> > > >
> > > > I would like to know if anyone knows if the MySQL team is aware of
> this
> > > > problem, and if so whether they are planning on improving it at any
> > > > point. I ask mainly because if I am told that "yes, it'll be much
> better
> > > > in version X" then I can live with the couple of seconds that it
> takes
> > > > currently, knowing that this will be better down the line. However
> if I
> > > > am advised that this is a fundamentally hard problem for whatever
> > > > reason, then I will put the effort into reworking my tables to use
> an
> > > > alternative solution.
> > > >
> > > > The only real solution that I can see which is fast is to make
> another
> > > > table which contains just the unique IDs of the pictures that are
> > > > visible (there are others which are not publicly visible, and which
> > > > shouldn't be included in the random query, so making a separate
> table
> > > > with the appropriate subset makes sense for performance). This new
> table
> > > > will have a primary key which is a numeric "sequence" field. Every
> > > > record will have its own sequence number, going from 1 up to the
> number
> > > > of records. Then, instead of doing one query with ORDER BY RAND()
> LIMIT
> > > > 30, I can instead do 30 queries, each with a different random
> sequence
> > > > (generated from Perl), which will look up the unique sequence
> number.
> > > > Since this is a primary key, it will be very fast, so that doing 30
> > > > queries will not have a big performance impact. However this scheme
> > > > requires that the sequences in the new table be kept very consistent
> -
> > > > for example, if a picture is removed from the sequence then the
> sequence
> > > > numbers above that record have to be updated. This introduces
> potential
> > > > for error, but it is a possible solution. I don't want to implement
> it,
> > > > obviously, if ORDER BY RAND() is slated for improvement.
> > > >
> > > > Thanks for any ideas or insights...
> > > >
> > > > -Neil Gunton
> > > >
> > > > --
> > > > MySQL General Mailing List
> > > > For list archives: http://lists.mysql.com/mysql
> > > > To unsubscribe:
> > > http://lists.mysql.com/[EMAIL PROTECTED]
> > > >
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:
> > > http://lists.mysql.com/[EMAIL PROTECTED]
> >
> >
> >
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: ORDER BY RAND() performance

Reply via email to