Hi Marcelo, > Today we have approx. 400.000 items in the repository > > Consider three PDF files per day, with a variable number os pages (100 ~ > 5000) each. > > We create the hierarchy Year (community) -> Month (Community) -> Day > (Community) -> File (collection) for each file. For instance, 2008 -> Aug -> > 13 -> My file 1; 2008 -> Aug -> 13 -> My file 2; 2008 -> Aug -> 13 -> My file > 3. > We upload each file page (a PDF of approx. 100Kb) as a different item. > In short, we have to index hundreds of small items per day. > > The server configuration is: > > CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2992.52-MHz K8-class CPU) > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > real memory = 2147483648 (2048 MB) > avail memory = 2056413184 (1961 MB) > > Postgres db running in the same machine > > Disk (for DSpace and database) is a sorage mounted with nfs
Depending on where the bottleneck is (possible in several places) you could try a few changes. Of course all this advice should be taken with a pinch of salt as everyone's system is different. The first change I would try is to move the postgres database (data directory) and search index files (dspace/search/) to a local (or locally attached) disk. This should be a lot faster. NFS is naturally slower than local disk, and operates slightly differently. Both Postgres and Lucene recommend that you don't run them over NFS. Since the indexing is primarily building search indexes in the database and in lucene, you should see some speed improvements. NFS should be fine for the assetstore though. Of course if you move the search index and postgres database files to a local disk, you'll need to remember to ensure they are included in your backup schedule. (Some people don't backup their DSpace lucene search index as it changes often and can be regenerated from DSpace - but that is up to you). If you still need a speed improvement, then depending on your budget you could try a few options: - Databases thrive well with a lot of RAM. The more data they can cache in memory, the faster they will be. Luckily RAM is quite cheap. Also read the documentation online about tuning Postgres to make sure it uses the RAM. - If you have a bit more money, try buying a second server, and running postgres on that. If you do find any of these give a useful improvement, please could you report back - that way we'll be able to give useful advice to other people in a similar situation. I hope that helps, Stuart Lewis IT Innovations Analyst and Developer Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech