There is a MultiPhraseQuery we use which looks a bit like: MultiPhraseQuery query = new MultiPhraseQuery(); query.add(new Term[] { "first" }); query.add(new Term[] { "second1", "second2", ... });
The actual number of terms in this particular case is 207087. The size of the index itself is 21GB or so, with around 1,300,000 docs. Large but not gigantic. I ran the test with 2GB of RAM which was certainly enough for Lucene 3. Although I do think that this is abusing MultiPhraseQuery and that SpanQuery is probably a better fit, I think that back in Lucene 3, there were problems with SpanQuery performance which resulted in switching to this as a performance hack. Anyway, we now get an OOME when running this query and the heap histogram comes out sort of like this: int[] 995,093 (5.2%) 617,539,592 (31.6%) byte[] 1,065,597 (5.6%) 434,990,616 (22.3%) DocIdSet[] 777,620 (4.1%) 149,303,040 (7.6%) Lucene50PostingsReader$BlockPostingsEnum 326,022 (1.7%) 67,486,554 (3.5%) Lucene50PostingsFormat$IntBlockTermState 621,265 (3.2%) 57,777,645 (3%) I went looking for the owner of these int arrays and it turns out to be a postings reader which is ultimately (unsurprisingly) being held by the MultiPhraseQuery. What I'm wondering is: - Why the increase in memory cost? - Is our performance hack of using MultiPhraseQuery over SpanQuery really warranted anymore? - Is there a better way to do this particular query? Also, just in case this is an X-Y problem, what we're actually implementing here is simulating a large number of integer fields without using a large number of fields. We index the name of the sub-field followed by the value and then use this as a proximity query to say "find values in range X to Y with the sub-field immediately in front". This was done because there was some conventional wisdom saying that having a large number of fields in Lucene is problematic, although whether this still applies is unknown. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org