Re: Does a RAMDirectory ever need to merge segments... (performanceissue)

2004-04-21 Thread Gerard Sychay
I've always wondered about this too.  To put it another way, how does
mergeFactor affect an IndexWriter backed by a RAMDirectory?  Can I set
mergeFactor to the highest possible value (given the machine's RAM) in
order to avoid merging segments?

 Kevin A. Burton [EMAIL PROTECTED] 04/20/04 04:40AM 
I've been benchmarking our indexer to find out if I can squeeze any
more 
performance out of it.

I noticed one problem with RAMDirectory... I'm storing documents in 
memory and then writing them to disk every once in a while. ...

IndexWriter.maybeMergeSegments is taking up 5% of total runtime. 
DocumentWriter.addDocument is taking up another 17% of total runtime.

Notice that this doesn't == 100% becuase there are other tasks taking
up 
CPU before and after Lucene is called.

Anyway... I don't see why RAMDirectory is trying to merge segments.  Is

there anyway to prevent this?  I could just store them in a big 
ArrayList until I'm ready to write them to a disk index but I'm not
sure 
how efficient this will be.

Anyone run into this before?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Does a RAMDirectory ever need to merge segments... (performanceissue)

2004-04-21 Thread Kevin A. Burton
Gerard Sychay wrote:

I've always wondered about this too.  To put it another way, how does
mergeFactor affect an IndexWriter backed by a RAMDirectory?  Can I set
mergeFactor to the highest possible value (given the machine's RAM) in
order to avoid merging segments?
 

Yes... actually I was thinking of increasing these vars on the 
RAMDirectory in the hope to avoid this CPU overhead..

Also I think the var you want is minMergeDocs not mergeFactor.  the only 
problem is that the source to maybeMergeSegments says:

  private final void maybeMergeSegments() throws IOException {
long targetMergeDocs = minMergeDocs;
while (targetMergeDocs = maxMergeDocs) {
So I guess to prevent this we would have to set minMergeDocs to 
maxMergeDocs+1 ... which makes not sense.  Also by default maxMergeDocs 
is Integer.MAX_VALUE so that will have to be changed.

Anyway... I'm still playing with this myself. It might be easier to just 
use an ArrayList of N documents if you know for sure how big your RAM 
dir will grow to.

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature