hmm, I seem to be getting a different number of hits when I use the files you sent out.
-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: 12. november 2001 20:47 To: 'Lucene Users List' Subject: RE: Memory Usage? > From: Anders Nielsen [mailto:[EMAIL PROTECTED]] > > this was a big boolean query, with several prefixqueries but > no wildcard > queries in the or-branches. Well it looks like those prefixes are expanding to a lot of terms, a total of over 40,000! (A prefix query expands into a BooleanQuery with all the terms matching the prefix.) If most of these expansions are low-frequency, then a simple fix should improve things considerably. I've attached an optimized version of TermQuery that will hold less memory per low-frequency term. In particular, if a term occurs fewer than 128 times then a 1024 byte InputStream buffer is freed immediately. Tell me how this works. Please send another heap dump. Longer term, or if lots of the expanded terms occur more than 128 times, perhaps BooleanScorer should use a different algorithm when there are thousands of terms. In this case it might use less memory to construct an array of score buckets for all documents. If (query.termCount() * 1024) > (12 * getMaxDoc()) then this would use less memory. In your case, with 500,000 documents and a 40,000 term query, it's currently taking 40MB/query, and could be done in 6MB/query. This optimization would not be too difficult, as it could be mostly isolated to BooleanQuery and BooleanScorer. Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
