Hi

I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may
sound like it belongs to the user list, but I actually think there is a
problem with it, so I'm posting it to the dev list.

I'm using 2.3.1 to index a set of documents (500K Amazon books to be exact).
I don't use norms and most of the fields I index are also stored. I'm
setting IndexWriter like this:
            indexwriter.setRAMBufferSizeMB(128);
            indexwriter.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);

AFAIU, the first line would set the RAM usage by IW to 128MB and the second
would disable flushing by doc count. Naturally, I'd expect nothing to be
written to the file system until those 128MB are consumed. However, that
does not seem to be the case. I watch the file system and do periodic
refresh (Windows) and I notice that stuff gets written to the disk (.fdt
file) every few KB. Task Manager shows the application is not consuming
128MB ...
So I debug-traced the application and noticed the following:
- DocumentsWriter calls fieldsWriter.flushDocument in writeDocument(),
passing a RAMOutputStream instance (fdtLocal).
- FieldsWriter calls RAMOutputStream.writeTo() and passes fieldsStream,
which is of type FSIndexOutput.
- FSIndexOutput maintains an internal buffer of size 16KB (fixed) and
eventually flushes the buffer to the RandomAccessFile it maintains.

So far, the 128MB setting was not applied anywhere, AFAIK.

Can someone please explain me how this works? Am I missing something (maybe
a patch post 2.3.1).

One other thing I forgot to mention, I've started this investigation after
playing with the RAM usage and maxBufferredDocs usage. Setting MBD to 10,000
resulted in the same performance as setting RAM to 128MB, however it
consumed much less RAM (~70MB according to Windows' Task Manager, which is
not the most accurate thing).

Thanks in advance,
Shai

Reply via email to