Hi I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may sound like it belongs to the user list, but I actually think there is a problem with it, so I'm posting it to the dev list.
I'm using 2.3.1 to index a set of documents (500K Amazon books to be exact). I don't use norms and most of the fields I index are also stored. I'm setting IndexWriter like this: indexwriter.setRAMBufferSizeMB(128); indexwriter.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); AFAIU, the first line would set the RAM usage by IW to 128MB and the second would disable flushing by doc count. Naturally, I'd expect nothing to be written to the file system until those 128MB are consumed. However, that does not seem to be the case. I watch the file system and do periodic refresh (Windows) and I notice that stuff gets written to the disk (.fdt file) every few KB. Task Manager shows the application is not consuming 128MB ... So I debug-traced the application and noticed the following: - DocumentsWriter calls fieldsWriter.flushDocument in writeDocument(), passing a RAMOutputStream instance (fdtLocal). - FieldsWriter calls RAMOutputStream.writeTo() and passes fieldsStream, which is of type FSIndexOutput. - FSIndexOutput maintains an internal buffer of size 16KB (fixed) and eventually flushes the buffer to the RandomAccessFile it maintains. So far, the 128MB setting was not applied anywhere, AFAIK. Can someone please explain me how this works? Am I missing something (maybe a patch post 2.3.1). One other thing I forgot to mention, I've started this investigation after playing with the RAM usage and maxBufferredDocs usage. Setting MBD to 10,000 resulted in the same performance as setting RAM to 128MB, however it consumed much less RAM (~70MB according to Windows' Task Manager, which is not the most accurate thing). Thanks in advance, Shai