I contest to the value of increasing the minMergeDocs.....it directly effects how much IO gets performed in indexing. Splitting it into multiple indices (if you want to pay the price of complexity), may well increase your throughput. Assuming you are not utilizing all of the resources the system offers that is. Say for example you have two indexing threads and one writer per thread. You can benifit in a few ways here. Firstly indexing is a mixture of cpu and io bound (certainly easier to observe that effect when you increase the minMergeDocs). If you have an smp or ht box then you potentially have the ability to use two "hardware threads" to concurrently use. Further you will have more chance for overlapping io. A quick profile run may also give you clues on how inefficient your code is. C
Volodymyr Bychkoviak <[EMAIL PROTECTED]> wrote: JM Tinghir wrote: >>Could you qualify a bit more about what is slow? >> >> > >Well, it just took 145 minutes to index 2670 files (450 MB) in one >index (29 MB). >It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB). > > > > I think it took so much time, because it's merged too ofter. try to increase IndexWriter.mergeFactor (but be aware of TooManyOpenFiles Exception when setting too high) (default 10) and try to increase IndexWriter.minMergeDocs (consume more ram, but works faster). (default 10) playing a bit with this parameters you can speed up your indexing process. >>Perhaps you need to optimize the index? >> >> > >Perhaps, never tried it... > >JM > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > regards, Volodymyr Bychkoviak --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]