I now know the cause of the problem. Increasing heap space actually
breaks Lucene when reading large indexes.
Details on why can be found here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
Lucene is trying to read a huge block (about 2580Mb) at the point of
failure. While it allocates the required bytes in method
MultiSegmentReader.norms(), line 334, just fine, it is only when it
attempts to use this array in a call to RandomAccessFile.readBytes()
that it gets OOM. This is caused by a bug in the native code for the
Java IO.
As observed in the bug report, large heap space actually causes the
bug to appear. When I reduced my heap from 1200M to 1000M the
exception was never generated and the code completed correctly and it
reported the correct number of search hits in the Hibernate Search
version of my program.
This isn't good - I need as much memory as possible because I intend
to run my search as a web service.
The work-around would be to read the file in small chunks, but I am
not familiar with the Lucene code so I am unsure how that would be
done in a global sense (i.e.: does it really need to allocate a buffer
of that size in MultiSegmentReader?)
The obvious solution (which I haven't tried yet) would be to patch the
point in FSDirectory where the java IO read occurs - looping with a
small buffer for the read and then concatenating the result back into
Lucene's byte array.
Thanks for the comments on this problem from people on this list.
Quoting Ted Dunning <[email protected]>:
try running with verbose gc. That will give you more details about what is
happening.
Even better, run with jconsole on the side so that you get really detailed
information on memory pools.
On Thu, Mar 12, 2009 at 7:30 AM, <[email protected]> wrote:
Thanks Mike and Jokin for your comments on the memory problem. I have
submitted the query to the Hibernate Search list although I haven't seen a
response yet.
In the meantime I did my own investigating in the code (I'd rather have
avoided this!). I'm seeing results that don't make any sense and maybe
someone here with more experience of Lucene and the way memory is allocated
by the JVM may shed light on, what to me, are quite illogical observations.
As you may recall I had a stand-alone Lucene search and a Hibernate Search
version. Looking in the HS code did not shed any light on the issue. I took
my stand-alone Lucene code and put it in a method and replaced the search in
the HS class (the constructor of QueryHits.java) with the call to my method.
Bear in mind this method is the same code as posted in my earlier message -
it sets up the Lucene search from scratch (i.e.: no data structures created
by HS were used). So, effectively I was calling my stand-alone code after
any setup done by Hibernate and any memory it may have allocated (which
turned out to be a few Mb).
I get OOM! Printing the free memory at this point shows bags of memory
left. Indeed, the same free memory (+/- a few Mb) as the stand-alone Lucene
version!
I then instrumented the Lucene method where the OOM is occuring
(FSDirectory.readInternal()). I cannot understand the results I am seeing.
Below is a snippet of the output of each with the code around FSDirectory
line 598 as follows:
...
do {
long tot = Runtime.getRuntime().totalMemory();
long free =Runtime.getRuntime().freeMemory();
System.out.println("LUCENE: offset="+offset+" total="+total+"
len-total="+(len-total)+" free mem="+free+" used ="+(tot-free));
int i = file.read(b, offset+total, len-total);
...
The stand-alone version:
...
LUCENE: offset=0 total=0 len-total=401 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=883 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
=330080544
LUCENE: offset=0 total=0 len-total=209000000 free mem=631122912 used
=617534496
LUCENE: offset=209000000 total=0 len-total=20900000 free mem=631122912
used =617534496
LUCENE: offset=229900000 total=0 len-total=20900000 free mem=631122912
used =617534496
LUCENE: offset=250800000 total=0 len-total=20900000 free mem=631122912
used =617534496
...
completes successfully!
The method called via Hibernate Search:
...
LUCENE: offset=0 total=0 len-total=401 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=883 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
=334892152
LUCENE: offset=0 total=0 len-total=209000000 free mem=636731528 used
=622346104
Exception in thread "main" java.lang.OutOfMemoryError
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(Unknown Source)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:599)
... fails with exception!
Note that the HS version has slightly more free memory because I ran it
with -Xms1210M as opposed to -Xms1200M for the stand-alone to offset any
memory used by HS when it starts up.
As you can see, these are identical for all practical purposes. So what
gives?
I'm stumped, so any suggestions appreciated.
Thanks.
Quoting Michael McCandless <[email protected]>:
Unfortunately, I'm not familiar with exactly what Hibernate search does
with the Lucene APIs.
It must be doing something beyond what your standalone Lucene test case
does.
Maybe ask this question on the Hibernate list?
Mike
--
Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)