[email protected] wrote:
Yes, I overrode the read() method in
FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb
chunks and do an arraycopy() into the array created by Lucene. It
now works with any heap size and doesn't get OOM.
You shouldn't need to do the extra arraycopy? RandomAccessFile can
read into a particular offset/len inside the array. Does that not work?
There may be other areas this could happen in the Lucene code
(although at present it seems to be working fine for me on our
largest, 17Gb, index but I haven't tried accessing data yet - only
getting the result size - so perhaps there are other calls to read()
with large buffer sizes).
As this bug does not look like it will be fixed in the near future,
it might be an idea to put in place a fix in the Lucene code. I
think it would be safe to read in chunks of up to 100Mb without a
problem and I don't think it will affect performance to any great
degree.
I agree. Can you open a Jira issue and post a patch?
It's pleasing to see that Lucene can easily handle such huge
indexes, although this bug is obviously quite an impediment to doing
so.
Yes indeed. This is one crazy bug.
Mike