Re: Performance problems with Lucene 2.9

Michel Nadeau Mon, 30 Nov 2009 08:04:17 -0800

Hi !

Thanks so much !!


* I'll check the documentation for MatchAllDocsQuery.
* I'm already changing my code to create BooleanQueries instead of filters -
is that better than MatchAllDocsQuery or it's the same?
* Is using MatchAllDocsQuery the only way to disable scoring?
* Would you have any good example of how to use Collectors instead of Hits?

- Mike
[email protected]


On Mon, Nov 30, 2009 at 10:56 AM, Shai Erera <[email protected]> wrote:

> Hi
>
> First you can use MatchAllDocsQuery, which matches all documents. It will
> save a HUGE posting list (TAG:TAG), and performs much faster. For example
> TAG:TAG computes a score for each doc, even though you don't need it.
> MatchAllDocsQuery doesn't.
>
> Second, move away from Hits ! :) Use Collectors instead.
>
> If I understand the chain of filters, do you think you can code them with a
> BooleanQuery that is added BooleanClauses, each with is Term (field:value)?
> You can add clauses w/ OR, AND, NOT etc.
>
> Note that in Lucene 2.9, you can avoid scoring documents very easily, which
> is a performance win if you don't need scores (i.e. if you just want to
> match everything, not caring for scores).
>
> Shai
>
> On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <[email protected]> wrote:
>
> > Hi,
> >
> > we use Lucene to store around 300 millions of records. We use the index
> > both
> > for conventional searching, but also for all the system's data - we
> > replaced
> > MySQL with Lucene because it was simply not working at all with MySQL due
> > to
> > the amount or records. Our problem is that we have HUGE performance
> > problems... whenever we search, it takes forever to return results, and
> > Java
> > uses 100% CPU/RAM.
> >
> > Our index fields are like this:
> >
> > TYPE
> > PK
> > FOREIGN_PK
> > TAG
> > ...other information depending on type...
> >
> > * All fields are Field.Index.UN_TOKENIZED
> > * The field "TAG" always contains the value "TAG".
> >
> > Whenever we search in the index, our query is "TAG:TAG" to match all
> > documents, and we do the search like this:
> >
> >        // Search
> >        Hits h = searcher.search(q, cluCF, cluSort);
> >
> > cluCF is a ChainedFilter containing all the other filters (like
> > FOREIGN_PK=12345, TYPE=a, etc.).
> >
> > I know that the method is probably crazy because "TAG:TAG" is matching
> all
> > 300M documents and then it applies filters; so that's probably why every
> > little query is taking 100% CPU/RAM.... but I don't know how to do it
> > properly.
> >
> > Help ! Any advice is welcome.
> >
> > - Mike
> > [email protected]
> >
>

Re: Performance problems with Lucene 2.9

Reply via email to