Hi John,

[EMAIL PROTECTED] wrote:
I have tried jcr/jackrabbit and like it.
Next I would like to push jackrabbit to its limit:
load in as many items as possible. I would appreciate help on
a few configuration/tuning issues:
(1) which persistent manager to use?

in a recent test I imported over a million wikipedia articles which resulted in about 6 million items. no versioning, btw.

my configuration is:
dell latitude d505
db-persitence using derby
256m heap

at the beginning the time to add an article was about 5ms.
towards the end of the load the time to add an article was stable at about 50ms.

some other figures:
db size: 2 GB
index size: 300 MB

(2) what parameters to tune?

I can give you some advice on configuring the index: the default config will cause lucene to create segments of 100 nodes, which will be merged when as soon as 10 segments exist. when doing a bulk load you should set the paramter minMergeDocs to a higher value. e.g. 1000. this will create segments of 1000 nodes, and will be more efficient.

(3) will multiple wordspaces help?

IMO this might help, if you run into scalability issues with the persistence manager you are using.

(4) any other things to watch for?

use separate disks for the index and workspace data.

My host has 4GB ram and a few TB diskspace.

Also, any doc describing all possbile elements in repository.xml?

the sample repository.xml file in src/conf contains an inline dtd that contains some documentation.

And if SearchIndex can be turned off?

yes, this is possible. you simply omit the SearchIndex element in the configuration. though, I would be very interested to see how well the index works with your data.

regards
 marcel

Reply via email to