Re: large repository

Marcel Reutegger Tue, 25 Oct 2005 00:08:41 -0700

Hi John,

[EMAIL PROTECTED] wrote:

I have tried jcr/jackrabbit and like it.
Next I would like to push jackrabbit to its limit:
load in as many items as possible. I would appreciate help on
a few configuration/tuning issues:
(1) which persistent manager to use?

in a recent test I imported over a million wikipedia articles whichresulted in about 6 million items. no versioning, btw.


my configuration is:
dell latitude d505
db-persitence using derby
256m heap

at the beginning the time to add an article was about 5ms.

towards the end of the load the time to add an article was stable atabout 50ms.


some other figures:
db size: 2 GB
index size: 300 MB

(2) what parameters to tune?

I can give you some advice on configuring the index: the default configwill cause lucene to create segments of 100 nodes, which will be mergedwhen as soon as 10 segments exist. when doing a bulk load you should setthe paramter minMergeDocs to a higher value. e.g. 1000. this will createsegments of 1000 nodes, and will be more efficient.

(3) will multiple wordspaces help?

IMO this might help, if you run into scalability issues with thepersistence manager you are using.

(4) any other things to watch for?


use separate disks for the index and workspace data.

My host has 4GB ram and a few TB diskspace.

Also, any doc describing all possbile elements in repository.xml?

the sample repository.xml file in src/conf contains an inline dtd thatcontains some documentation.

And if SearchIndex can be turned off?

yes, this is possible. you simply omit the SearchIndex element in theconfiguration. though, I would be very interested to see how well theindex works with your data.


regards
 marcel

Re: large repository

Reply via email to