Re: problems with large Lucene index (reason found)

Michael McCandless Tue, 17 Mar 2009 05:11:42 -0700


I've opened:


    https://issues.apache.org/jira/browse/LUCENE-1566

for this.  Cameron, could you attach your patch to that issue?  Thanks.

Mike

[email protected] wrote:

Yes, I overrode the read() method inFSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mbchunks and do an arraycopy() into the array created by Lucene. Itnow works with any heap size and doesn't get OOM.
There may be other areas this could happen in the Lucene code(although at present it seems to be working fine for me on ourlargest, 17Gb, index but I haven't tried accessing data yet - onlygetting the result size - so perhaps there are other calls to read()with large buffer sizes).
As this bug does not look like it will be fixed in the near future,it might be an idea to put in place a fix in the Lucene code. Ithink it would be safe to read in chunks of up to 100Mb without aproblem and I don't think it will affect performance to any greatdegree.
It's pleasing to see that Lucene can easily handle such hugeindexes, although this bug is obviously quite an impediment to doingso.
regards,
Cameron Newham

Quoting Michael McCandless <[email protected]>:
Gak, what a horrible bug!

It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs).  And it's
been open for almost 2.5 years. I just added a comment & voted forthe
bug.

Does it also occur on a 64 bit JRE?

If you still allocate the full array, but read several smaller chunks
into it, do you still hit the bug?

Mike

[email protected] wrote:
I now know the cause of the problem. Increasing heap spaceactually breaks Lucene when reading large indexes.
Details on why can be found here:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
Lucene is trying to read a huge block (about 2580Mb) at the pointof failure. While it allocates the required bytes in methodMultiSegmentReader.norms(), line 334, just fine, it is only whenit attempts to use this array in a call toRandomAccessFile.readBytes() that it gets OOM. This is caused bya bug in the native code for the Java IO.
As observed in the bug report, large heap space actually causesthe bug to appear. When I reduced my heap from 1200M to 1000Mthe exception was never generated and the code completedcorrectly and it reported the correct number of search hits inthe Hibernate Search version of my program.
This isn't good - I need as much memory as possible because Iintend to run my search as a web service.
The work-around would be to read the file in small chunks, but Iam not familiar with the Lucene code so I am unsure how thatwould be done in a global sense (i.e.: does it really need toallocate a buffer of that size in MultiSegmentReader?)
The obvious solution (which I haven't tried yet) would be topatch the point in FSDirectory where the java IO read occurs -looping with a small buffer for the read and then concatenatingthe result back into Lucene's byte array.
Thanks for the comments on this problem from people on this list.



Quoting Ted Dunning <[email protected]>:
try running with verbose gc. That will give you more detailsabout what is
happening.
Even better, run with jconsole on the side so that you get reallydetailed
information on memory pools.

On Thu, Mar 12, 2009 at 7:30 AM, <[email protected]> wrote:
Thanks Mike and Jokin for your comments on the memory problem. Ihavesubmitted the query to the Hibernate Search list although Ihaven't seen a
response yet.
In the meantime I did my own investigating in the code (I'drather haveavoided this!). I'm seeing results that don't make any sense andmaybesomeone here with more experience of Lucene and the way memoryis allocatedby the JVM may shed light on, what to me, are quite illogicalobservations.
As you may recall I had a stand-alone Lucene search and aHibernate Searchversion. Looking in the HS code did not shed any light on theissue. I tookmy stand-alone Lucene code and put it in a method and replacedthe search inthe HS class (the constructor of QueryHits.java) with the callto my method.Bear in mind this method is the same code as posted in myearlier message -it sets up the Lucene search from scratch (i.e.: no datastructures createdby HS were used). So, effectively I was calling my stand-alonecode afterany setup done by Hibernate and any memory it may have allocated(which
turned out to be a few Mb).
I get OOM! Printing the free memory at this point shows bags ofmemoryleft. Indeed, the same free memory (+/- a few Mb) as the stand-alone Lucene
version!

I then instrumented the Lucene method where the OOM is occuring
(FSDirectory.readInternal()). I cannot understand the resultsI am seeing.Below is a snippet of the output of each with the code aroundFSDirectory
line 598 as follows:

...
    do {
            long tot = Runtime.getRuntime().totalMemory();
            long free =Runtime.getRuntime().freeMemory();
System.out.println("LUCENE: offset="+offset+"  total="+total+"
len-total="+(len-total)+" free mem="+free+" used ="+(tot-free));
      int i = file.read(b, offset+total, len-total);
...



The stand-alone version:

...
LUCENE: offset=0 total=0 len-total=401 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=883 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864used
=330080544
LUCENE: offset=0 total=0 len-total=209000000 freemem=631122912 used
=617534496
LUCENE: offset=209000000 total=0 len-total=20900000 freemem=631122912
used =617534496
LUCENE: offset=229900000 total=0 len-total=20900000 freemem=631122912
used =617534496
LUCENE: offset=250800000 total=0 len-total=20900000 freemem=631122912
used =617534496
...
completes successfully!


The method called via Hibernate Search:

...
LUCENE: offset=0 total=0 len-total=401 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=883 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480used
=334892152
LUCENE: offset=0 total=0 len-total=209000000 freemem=636731528 used
=622346104
Exception in thread "main" java.lang.OutOfMemoryError
    at java.io.RandomAccessFile.readBytes(Native Method)
    at java.io.RandomAccessFile.read(Unknown Source)
    at
org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:599)
... fails with exception!
Note that the HS version has slightly more free memory because Iran itwith -Xms1210M as opposed to -Xms1200M for the stand-alone tooffset any
memory used by HS when it starts up.
As you can see, these are identical for all practical purposes.So what
gives?

I'm stumped, so any suggestions appreciated.

Thanks.


Quoting Michael McCandless <[email protected]>:
Unfortunately, I'm not familiar with exactly what Hibernatesearch does
with the Lucene APIs.
It must be doing something beyond what your standalone Lucenetest case
does.

Maybe ask this question on the Hibernate list?

Mike
--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)

Re: problems with large Lucene index (reason found)

Reply via email to