Re: problems with large Lucene index

Andrzej Bialecki Mon, 09 Mar 2009 09:23:45 -0700

[email protected] wrote:

Thanks Michael,
There is no sorting on the result (adding a sort causes OOM well beforethe point it runs out for the default).
There are no deleted docs - the index was created from a set of docs andno adds or deletes have taken place.
Memory isn't being consumed elsewhere in the system. It all comes downto the Lucene call via Hibernate Search. We decided to split our hugeindex into a set of several smaller indexes. Like the original singleindex, each smaller index has one field which is tokenized and the otherfields have NO_NORMS set.
The following, explicitely specifying just one index, works fine:
org.hibernate.search.FullTextQuery fullTextQuery =fullTextSession.createFullTextQuery( outerLuceneQuery, MarcText2.class );
But as soon as we start adding further indexes:
org.hibernate.search.FullTextQuery fullTextQuery =fullTextSession.createFullTextQuery( outerLuceneQuery, MarcText2.class,MarcText8.class );
We start running into OOM.
In our case the MarcText2 index has a total disk size of 5Gb (with57589069 documents / 75491779 terms) and MarcText8 has a total size of6.46Gb (with 79339982 documents / 104943977 terms).
Adding all 8 indexes (the same as our original single index), either byexplicitely naming them or just with:
org.hibernate.search.FullTextQuery fullTextQuery =fullTextSession.createFullTextQuery( outerLuceneQuery);
results in it becoming completely unusable.
One thing I am not sure about is that in Luke it tells me for an index(neither of the indexes mentioned above) that was created with NO_NORMSset on all the fields:
"Index functionality: lock-less, single norms, shared doc store,checksum, del count, omitTf"
Is this correct? I am not sure what it means by "single norms" - Iwould have expected it to say "no norms".

This is just an expert-level info about the capability of the indexformat, it doesn't say anything about the actual flags on fields.

Any further ideas on where to go from here? Your estimate of what isloaded into memory suggests that we shouldn't really be anywhere nearrunning out of memory with these size indexes!
As I said in my OP, Luke also gets a heap error on searching ouroriginal single large index which makes me wonder if it is a problemwith the construction of the index.

In the open index dialog in Luke set the the "Custom term infosdivisor" to a value higher than 1 and try to open the index again. Ifthis still doesn' work, make a copy of the index, and then open a copyof this index in Luke - but try the option "Don't open IndexReader", andthen run CheckIndex from menu. PLEASE DO THIS ON A COPY OF THE INDEX.

Oh, and of course you could start with increasing the heapsize whenrunning Luke, but I think that's obvious.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: problems with large Lucene index

Reply via email to