Re: problems with large Lucene index

Michael McCandless Fri, 06 Mar 2009 01:47:49 -0800

Lucene is trying to allocate the contiguous norms array for yourindex, which should be ~273 MB (=286/1024/1024), when it hits the OOM.

Is your search sorting by field value? (Which'd also consume memory.)Or it's just the default (by relevance) sort?

The only other biggish consumer of memory should be the deleted docs,but that's a BitVector so it should need ~34 MB RAM.


Can you run a memory profiler to see what else is consuming RAM?

Mike

[email protected] wrote:

Hello,
I am using Lucene via Hibernate Search but the following problem isalso seen using Luke. I'd appreciate any suggestions for solvingthis problem.
I have a Lucene index (27Gb in size) that indexes a database tableof 286 million rows. While Lucene was able to perform this indexingjust fine (albeit very slowly), using the index has proved to beimpossible. Any searches conducted on it, either from my HibernateSearch query or by placing the query into Luke give:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.MultiReader.norms(MultiReader.java:271)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:230)at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
...


The type of queries are simple, of the form:

(+value:church +marcField:245 +subField:a)

which in this example should only return a few thousand results.
The interpreter is already running with the maximum of heap spaceallowed on for the Java executable running on Windows XP ( java -Xms1200m -Xmx 1200m)
The Lucene index was created using the following Hibernate Searchannotations:
@Column
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS,store=Store.NO)
private Integer marcField;

@Column (length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS,store=Store.NO)
private String subField;

@Column(length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS,store=Store.NO)
private String indicator1;

@Column(length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS,store=Store.NO)
private String indicator2;

@Column(length = 10000)
@Field(index=org.hibernate.search.annotations.Index.TOKENIZED,store=Store.NO)
private String value;

@Column
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS,store=Store.NO)
private Integer recordId;
So all of the fields have NO NORMS except for "value" which iscontains description text that needs to be tokenised.
Is there any way around this? Does Lucene really have such a lowlimit for how much data it can search (and I consider 286 milliondocuments to be pretty small beer - we were hoping to index a tableof over a billion rows)? Or is there something I'm missing?
Thanks.

Re: problems with large Lucene index

Reply via email to