On Tue, Apr 14, 2009 at 04:46:59PM +0200, Ruijgrok, P.T. (Peter) wrote:
> Problem 2:
> ---------------
> checker -d 10h processes approximately 250 bitstreams / hour
>
> checker -d 1874/12233 (collection handle) processes approximately 20.000
> bitstreams / hour
>
> So running the checker every night for eg 10 hours is not possible.....
>
>
> Cause ( I think):
> ----------------------
> select bitstream_id
> from most_recent_checksum where to_be_processed = true
> order by date_trunc('milliseconds', last_process_end_date),
> bitstream_id ASC LIMIT 1;
>
> This statement requests the next bitstream to check and it takes +10
> seconds to find the next bitstream on our 320.000+ dspace.I don't *know* what is causing this, and I'm not yet a DBMS performance expert, but my eye is attracted to the ORDER BY clause. The monitor is having to build a temporary table of some probably very sizable subset of 320.000 records, and sort it. There is no index on last_process_end_date. That may not be significant, but I suppose that the query planner might be able to eliminate the sorting if that column were indexed. It might be able to make other optimizations as well, but I'm just guessing. Anyway, assuming you are using PostgreSQL, you could try using EXPLAIN to check out that command. Just connect 'psql' to the correct database and insert EXPLAIN in front of your query, and you will get information on how the query planner views that query and potentially spot the part that is costing so much. See "Performance Tips", and especially "Using EXPLAIN", in the Pg documentation. -- Mark H. Wood, Lead System Programmer [email protected] Friends don't let friends publish revisable-form documents.
pgpW9cFnJiiET.pgp
Description: PGP signature
------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

