On 16 Apr 2009, at 16:44, Mark H. Wood wrote:
> On Tue, Apr 14, 2009 at 04:46:59PM +0200, Ruijgrok, P.T. (Peter)
> wrote:
>> Problem 2:
>> ---------------
>> checker -d 10h processes approximately 250 bitstreams / hour
>>
>> checker -d 1874/12233 (collection handle) processes approximately
>> 20.000
>> bitstreams / hour
>>
>> So running the checker every night for eg 10 hours is not
>> possible.....
>>
>>
>> Cause ( I think):
>> ----------------------
>> select bitstream_id
>> from most_recent_checksum where to_be_processed = true
>> order by date_trunc('milliseconds',
>> last_process_end_date),
>> bitstream_id ASC LIMIT 1;
>>
>> This statement requests the next bitstream to check and it takes +10
>> seconds to find the next bitstream on our 320.000+ dspace.
>
> I don't *know* what is causing this, and I'm not yet a DBMS
> performance expert, but my eye is attracted to the ORDER BY clause.
> The monitor is having to build a temporary table of some probably very
> sizable subset of 320.000 records, and sort it. There is no index on
> last_process_end_date. That may not be significant, but I suppose
> that the query planner might be able to eliminate the sorting if that
> column were indexed. It might be able to make other optimizations as
> well, but I'm just guessing.
My first suggestion would be to take out that call to date_trunc().
Even if last_process_end_date is indexed, the results of truncating it
won't be. It's unnecessary and probably a significant performance hit.
So, time
select bitstream_id
from most_recent_checksum where to_be_processed = true
order by date_trunc('milliseconds', last_process_end_date),
bitstream_id ASC LIMIT 1;
against
select bitstream_id
from most_recent_checksum where to_be_processed = true
order by last_process_end_date,
bitstream_id ASC LIMIT 1;
and see what the difference is.
--
Simon Brown <[email protected]> - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech