[email protected] wrote:
Thanks Michael,

There is no sorting on the result (adding a sort causes OOM well before the point it runs out for the default).

There are no deleted docs - the index was created from a set of docs and no adds or deletes have taken place.

Memory isn't being consumed elsewhere in the system. It all comes down to the Lucene call via Hibernate Search. We decided to split our huge index into a set of several smaller indexes. Like the original single index, each smaller index has one field which is tokenized and the other fields have NO_NORMS set.

The following, explicitely specifying just one index, works fine:

org.hibernate.search.FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( outerLuceneQuery, MarcText2.class );

But as soon as we start adding further indexes:

org.hibernate.search.FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( outerLuceneQuery, MarcText2.class, MarcText8.class );

We start running into OOM.

In our case the MarcText2 index has a total disk size of 5Gb (with 57589069 documents / 75491779 terms) and MarcText8 has a total size of 6.46Gb (with 79339982 documents / 104943977 terms).

Adding all 8 indexes (the same as our original single index), either by explicitely naming them or just with:

org.hibernate.search.FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( outerLuceneQuery);

results in it becoming completely unusable.


One thing I am not sure about is that in Luke it tells me for an index (neither of the indexes mentioned above) that was created with NO_NORMS set on all the fields:

"Index functionality: lock-less, single norms, shared doc store, checksum, del count, omitTf"

Is this correct? I am not sure what it means by "single norms" - I would have expected it to say "no norms".

This is just an expert-level info about the capability of the index format, it doesn't say anything about the actual flags on fields.

Any further ideas on where to go from here? Your estimate of what is loaded into memory suggests that we shouldn't really be anywhere near running out of memory with these size indexes!

As I said in my OP, Luke also gets a heap error on searching our original single large index which makes me wonder if it is a problem with the construction of the index.

In the open index dialog in Luke set the the "Custom term infos divisor" to a value higher than 1 and try to open the index again. If this still doesn' work, make a copy of the index, and then open a copy of this index in Luke - but try the option "Don't open IndexReader", and then run CheckIndex from menu. PLEASE DO THIS ON A COPY OF THE INDEX.

Oh, and of course you could start with increasing the heapsize when running Luke, but I think that's obvious.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to