Hi Marcelo,

> Today we have approx. 400.000 items in the repository
> 
> Consider three PDF files per day, with a variable number os pages (100 ~ 
> 5000) each.
> 
> We create the hierarchy Year (community) -> Month (Community) -> Day 
> (Community) -> File (collection) for each file. For instance, 2008 -> Aug -> 
> 13 -> My file 1; 2008 -> Aug -> 13 -> My file 2; 2008 -> Aug -> 13 -> My file 
> 3.
> We upload each file page (a PDF of approx. 100Kb) as a different item.
> In short, we have to index hundreds of small items per day.
> 
> The server configuration is:
> 
> CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2992.52-MHz K8-class CPU)
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> real memory = 2147483648 (2048 MB)
> avail memory = 2056413184 (1961 MB)
> 
> Postgres db running in the same machine
> 
> Disk (for DSpace and database) is a sorage mounted with nfs

Depending on where the bottleneck is (possible in several places) you could try 
a few changes. Of course all this advice should be taken with a pinch of salt 
as everyone's system is different.

The first change I would try is to move the postgres database (data directory) 
and search index files (dspace/search/) to a local (or locally attached) disk. 
This should be a lot faster. NFS is naturally slower than local disk, and 
operates slightly differently. Both Postgres and Lucene recommend that you 
don't run them over NFS. Since the indexing is primarily building search 
indexes in the database and in lucene, you should see some speed improvements.

NFS should be fine for the assetstore though. Of course if you move the search 
index and postgres database files to a local disk, you'll need to remember to 
ensure they are included in your backup schedule. (Some people don't backup 
their DSpace lucene search index as it changes often and can be regenerated 
from DSpace - but that is up to you).

If you still need a speed improvement, then depending on your budget you could 
try a few options:

 - Databases thrive well with a lot of RAM. The more data they can cache in 
memory, the faster they will be. Luckily RAM is quite cheap. Also read the 
documentation online about tuning Postgres to make sure it uses the RAM.

 - If you have a bit more money, try buying a second server, and running 
postgres on that. 

If you do find any of these give a useful improvement, please could you report 
back - that way we'll be able to give useful advice to other people in a 
similar situation.

I hope that helps,


Stuart Lewis
IT Innovations Analyst and Developer
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: +64 (0)9 373 7599 x81928


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to