On 2012-10-09 17:24, Bob MacCallum wrote: > Just for closure, I am now in business. > > [mysqld] > tmp_table_size = 5G > > > Background for whoever cares: > > I'm running something reporter-wise in parallel (4 threads) and wanted > roughly equal workload per "chunk" of reporters (some reporters have > more experiments+data than others) so having them in random order > before chunking does the trick. > > The tmp tables that the SELECT * FROM Reporters ORDER BY RAND(123) > creates are 3.2G each (times 4) and while doing this there's some > serious I/O clogging the machine but it pulls through. If only gnu > parallel had a -stagger [secs] option!
ORDER BY RAND() is known for performance issues once the number of rows grows large. I guess it is because MySQL need to create a temporary table with *all* the original data plus the random number for each row before the data can be sorted. It might be more efficient to first do 'select id from Reporters' (you'll need to use the Reporter.getDynamicQuery() for this), put all id's in an array, randomize that in memory and then bring in the actual reporter data one by one (or 100 by 100 or something). That should at least avoid the temporary table issue since the id's can be pulled in directly from the primary key index. In theory it should be quicker, but I guess it has to be verified in practice as well. /Nicklas ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net