Re: problems with large Lucene index (reason found)

Michael McCandless Fri, 13 Mar 2009 08:16:51 -0700


[email protected] wrote:

Yes, I overrode the read() method inFSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mbchunks and do an arraycopy() into the array created by Lucene. Itnow works with any heap size and doesn't get OOM.

You shouldn't need to do the extra arraycopy? RandomAccessFile canread into a particular offset/len inside the array. Does that not work?

There may be other areas this could happen in the Lucene code(although at present it seems to be working fine for me on ourlargest, 17Gb, index but I haven't tried accessing data yet - onlygetting the result size - so perhaps there are other calls to read()with large buffer sizes).
As this bug does not look like it will be fixed in the near future,it might be an idea to put in place a fix in the Lucene code. Ithink it would be safe to read in chunks of up to 100Mb without aproblem and I don't think it will affect performance to any greatdegree.


I agree.  Can you open a Jira issue and post a patch?

It's pleasing to see that Lucene can easily handle such hugeindexes, although this bug is obviously quite an impediment to doingso.


Yes indeed.  This is one crazy bug.

Mike

Re: problems with large Lucene index (reason found)

Reply via email to