Hi ! Thanks so much !!
* I'll check the documentation for MatchAllDocsQuery. * I'm already changing my code to create BooleanQueries instead of filters - is that better than MatchAllDocsQuery or it's the same? * Is using MatchAllDocsQuery the only way to disable scoring? * Would you have any good example of how to use Collectors instead of Hits? - Mike [email protected] On Mon, Nov 30, 2009 at 10:56 AM, Shai Erera <[email protected]> wrote: > Hi > > First you can use MatchAllDocsQuery, which matches all documents. It will > save a HUGE posting list (TAG:TAG), and performs much faster. For example > TAG:TAG computes a score for each doc, even though you don't need it. > MatchAllDocsQuery doesn't. > > Second, move away from Hits ! :) Use Collectors instead. > > If I understand the chain of filters, do you think you can code them with a > BooleanQuery that is added BooleanClauses, each with is Term (field:value)? > You can add clauses w/ OR, AND, NOT etc. > > Note that in Lucene 2.9, you can avoid scoring documents very easily, which > is a performance win if you don't need scores (i.e. if you just want to > match everything, not caring for scores). > > Shai > > On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <[email protected]> wrote: > > > Hi, > > > > we use Lucene to store around 300 millions of records. We use the index > > both > > for conventional searching, but also for all the system's data - we > > replaced > > MySQL with Lucene because it was simply not working at all with MySQL due > > to > > the amount or records. Our problem is that we have HUGE performance > > problems... whenever we search, it takes forever to return results, and > > Java > > uses 100% CPU/RAM. > > > > Our index fields are like this: > > > > TYPE > > PK > > FOREIGN_PK > > TAG > > ...other information depending on type... > > > > * All fields are Field.Index.UN_TOKENIZED > > * The field "TAG" always contains the value "TAG". > > > > Whenever we search in the index, our query is "TAG:TAG" to match all > > documents, and we do the search like this: > > > > // Search > > Hits h = searcher.search(q, cluCF, cluSort); > > > > cluCF is a ChainedFilter containing all the other filters (like > > FOREIGN_PK=12345, TYPE=a, etc.). > > > > I know that the method is probably crazy because "TAG:TAG" is matching > all > > 300M documents and then it applies filters; so that's probably why every > > little query is taking 100% CPU/RAM.... but I don't know how to do it > > properly. > > > > Help ! Any advice is welcome. > > > > - Mike > > [email protected] > > >
