On Tue, Apr 14, 2009 at 04:46:59PM +0200, Ruijgrok, P.T. (Peter) wrote:
> Problem 2:
> ---------------
> checker -d 10h  processes approximately 250 bitstreams / hour
>  
> checker -d 1874/12233 (collection handle) processes approximately 20.000
> bitstreams / hour
>  
> So running the checker every night for eg 10 hours is not possible.....
>  
>  
> Cause ( I think):
> ----------------------
> select bitstream_id  
>             from most_recent_checksum where to_be_processed = true 
>             order by date_trunc('milliseconds', last_process_end_date),
>             bitstream_id ASC LIMIT 1;
> 
> This statement requests the next bitstream to check and it takes +10
> seconds to find the next bitstream on our 320.000+   dspace.

I don't *know* what is causing this, and I'm not yet a DBMS
performance expert, but my eye is attracted to the ORDER BY clause.
The monitor is having to build a temporary table of some probably very
sizable subset of 320.000 records, and sort it.  There is no index on
last_process_end_date.  That may not be significant, but I suppose
that the query planner might be able to eliminate the sorting if that
column were indexed.  It might be able to make other optimizations as
well, but I'm just guessing.

Anyway, assuming you are using PostgreSQL, you could try using EXPLAIN
to check out that command.  Just connect 'psql' to the correct
database and insert EXPLAIN in front of your query, and you will get
information on how the query planner views that query and potentially
spot the part that is costing so much.  See "Performance Tips", and
especially "Using EXPLAIN", in the Pg documentation.

-- 
Mark H. Wood, Lead System Programmer   [email protected]
Friends don't let friends publish revisable-form documents.

Attachment: pgpW9cFnJiiET.pgp
Description: PGP signature

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to