Re: Performance problems with Lucene 2.9

Michel Nadeau Mon, 30 Nov 2009 08:11:17 -0800

What is the main difference between Hits and Collectors?

- Mike
[email protected]



On Mon, Nov 30, 2009 at 11:03 AM, Uwe Schindler <[email protected]> wrote:

> And if you only have a filter and apply it to all documents, make a
> ConstantScoreQuery on top of the filter:
>
> Query q=new ConstantScoreQuery(cluCF);
>
> Then remove the filter from your search method call and only execute this
> query.
>
> And if you iterate over all results never-ever use Hits! (its already
> deprecated). Write a Collector instead (as you are not interested in
> scoring).
>
> And: If you replace a relational database with Lucene, be sure not to think
> in a relational sense with foreign keys / primary keys and so on. In
> general
> you should flatten everything.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
> > -----Original Message-----
> > From: Shai Erera [mailto:[email protected]]
> > Sent: Monday, November 30, 2009 4:56 PM
> > To: [email protected]
> > Subject: Re: Performance problems with Lucene 2.9
> >
> > Hi
> >
> > First you can use MatchAllDocsQuery, which matches all documents. It will
> > save a HUGE posting list (TAG:TAG), and performs much faster. For example
> > TAG:TAG computes a score for each doc, even though you don't need it.
> > MatchAllDocsQuery doesn't.
> >
> > Second, move away from Hits ! :) Use Collectors instead.
> >
> > If I understand the chain of filters, do you think you can code them with
> > a
> > BooleanQuery that is added BooleanClauses, each with is Term
> > (field:value)?
> > You can add clauses w/ OR, AND, NOT etc.
> >
> > Note that in Lucene 2.9, you can avoid scoring documents very easily,
> > which
> > is a performance win if you don't need scores (i.e. if you just want to
> > match everything, not caring for scores).
> >
> > Shai
> >
> > On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > we use Lucene to store around 300 millions of records. We use the index
> > > both
> > > for conventional searching, but also for all the system's data - we
> > > replaced
> > > MySQL with Lucene because it was simply not working at all with MySQL
> > due
> > > to
> > > the amount or records. Our problem is that we have HUGE performance
> > > problems... whenever we search, it takes forever to return results, and
> > > Java
> > > uses 100% CPU/RAM.
> > >
> > > Our index fields are like this:
> > >
> > > TYPE
> > > PK
> > > FOREIGN_PK
> > > TAG
> > > ...other information depending on type...
> > >
> > > * All fields are Field.Index.UN_TOKENIZED
> > > * The field "TAG" always contains the value "TAG".
> > >
> > > Whenever we search in the index, our query is "TAG:TAG" to match all
> > > documents, and we do the search like this:
> > >
> > >        // Search
> > >        Hits h = searcher.search(q, cluCF, cluSort);
> > >
> > > cluCF is a ChainedFilter containing all the other filters (like
> > > FOREIGN_PK=12345, TYPE=a, etc.).
> > >
> > > I know that the method is probably crazy because "TAG:TAG" is matching
> > all
> > > 300M documents and then it applies filters; so that's probably why
> every
> > > little query is taking 100% CPU/RAM.... but I don't know how to do it
> > > properly.
> > >
> > > Help ! Any advice is welcome.
> > >
> > > - Mike
> > > [email protected]
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Performance problems with Lucene 2.9

Reply via email to