Shai Erera wrote:
Thanks for clarifying that up. I thought I miss something :-)

No .. I don't use term vectors, only stored fields and indexed ones, no
norms or term vectors.

Hmm, then it's hard to explain why when you set buffer to 128 MB you never saw the process get up to that usage.

As for the efficiency of RAM usage by IndexWriter - what would perform
better: setting the RAM limit to 128MB, or create a RAMDirectory and add it
to an IndexWriter once it reaches 128 MB?

That is a good question. Early versions of LUCENE-843 actually flushed segments into a RAMDirectory and then once that RAMDir is full, merged the segments to the real directory, using only a fraction of the allowed RAM to hold the postings data.

Whereas the final one (for simplicity) just uses the entire buffer to hold the postings data.

You can directly see the inefficiency by looking at the size of the segments that are flushed: they are never the full size of the RAM buffer, due to the overhead of maintaining a malleable data structure that allows efficiently appending to the end of any term's posting list.

But, I suspect this may actually give a decent performance gain if you do use a RAMDirectory as an intermediary, except for the stored fields / term vectors which just use up RAM unnecessarily. Really you need a RAMDirectory that can somehow pass-through those files.

If you do some testing here please post back the results! I think this is a potential core change that could still give a sizable further performance gain to IndexWriter's throughput.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to