I contest to the value of increasing the minMergeDocs.....it directly effects 
how much IO gets performed in indexing.
 
Splitting it into multiple indices (if you want to pay the price of 
complexity), may well increase your throughput.  Assuming you are not utilizing 
all of the resources the system offers that is.  Say for example you have two 
indexing threads and one writer per thread.  You can benifit in a few ways 
here.  Firstly indexing is a mixture of cpu and io bound (certainly easier to 
observe that effect when you increase the minMergeDocs).  If you have an smp or 
ht box then you potentially have the ability to use two "hardware threads" to 
concurrently use.  Further you will have more chance for overlapping io.
 
A quick profile run may also give you clues on how inefficient your code is.
  
C

Volodymyr Bychkoviak <[EMAIL PROTECTED]> wrote:


JM Tinghir wrote:

>>Could you qualify a bit more about what is slow? 
>> 
>>
>
>Well, it just took 145 minutes to index 2670 files (450 MB) in one
>index (29 MB).
>It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
>
>
> 
>
I think it took so much time, because it's merged too ofter.
try to increase IndexWriter.mergeFactor (but be aware of 
TooManyOpenFiles Exception when setting too high) (default 10)
and try to increase IndexWriter.minMergeDocs (consume more ram, but 
works faster). (default 10)

playing a bit with this parameters you can speed up your indexing process.

>>Perhaps you need to optimize the index? 
>> 
>>
>
>Perhaps, never tried it...
>
>JM
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

regards,
Volodymyr Bychkoviak

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to