Shai Erera wrote:
Thanks for clarifying that up. I thought I miss something :-)
No .. I don't use term vectors, only stored fields and indexed
ones, no
norms or term vectors.
Hmm, then it's hard to explain why when you set buffer to 128 MB you
never saw the process get up to that usage.
As for the efficiency of RAM usage by IndexWriter - what would perform
better: setting the RAM limit to 128MB, or create a RAMDirectory
and add it
to an IndexWriter once it reaches 128 MB?
That is a good question. Early versions of LUCENE-843 actually
flushed segments into a RAMDirectory and then once that RAMDir is
full, merged the segments to the real directory, using only a
fraction of the allowed RAM to hold the postings data.
Whereas the final one (for simplicity) just uses the entire buffer to
hold the postings data.
You can directly see the inefficiency by looking at the size of the
segments that are flushed: they are never the full size of the RAM
buffer, due to the overhead of maintaining a malleable data structure
that allows efficiently appending to the end of any term's posting list.
But, I suspect this may actually give a decent performance gain if
you do use a RAMDirectory as an intermediary, except for the stored
fields / term vectors which just use up RAM unnecessarily. Really
you need a RAMDirectory that can somehow pass-through those files.
If you do some testing here please post back the results! I think
this is a potential core change that could still give a sizable
further performance gain to IndexWriter's throughput.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]