I had serious performance problems with the bitstream checker, running
Dspace 1.4.x
 
We have +320.000 bitstreams and increasing continously.
 
 
Problem 1:
----------------
Starting up with:  checker -d 10h
 
Resulted in several days of initialisation
 
Resolution
----------------
Cause: Missing an index
CREATE INDEX checksum_history_idx ON checksum_history(bitstream_id);

 
Now is starts up pretty reasonable in time.
 
 
Problem 2:
---------------
checker -d 10h  processes approximately 250 bitstreams / hour
 
checker -d 1874/12233 (collection handle) processes approximately 20.000
bitstreams / hour
 
So running the checker every night for eg 10 hours is not possible.....
 
 
Cause ( I think):
----------------------
select bitstream_id  
            from most_recent_checksum where to_be_processed = true 
            order by date_trunc('milliseconds', last_process_end_date),
            bitstream_id ASC LIMIT 1;

This statement requests the next bitstream to check and it takes +10
seconds to find the next bitstream on our 320.000+   dspace.
 
 
Question : 
How do I get this fixed in Dspace 1.5.x ???
 
 
Peter
 
 

.....

 

...............

Peter Ruijgrok | University Library Utrecht | www.uu.nl/library
<http://www.uu.nl/library>  | [email protected] | +31.30.253 6553 |
+31.6.10168880| Adres, Utrecht | room 1329 | Mon - Fri

 

 

<<attad738.jpg>>

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to