Here is what I've found so far: I have three main sets to use in a query: Account MUST be xxx User query DateRange on the query MUST be in (a,b) it is a NumericField
I tried the following combinations (all using a BooleanQuery with the user query added to it) 1. One: - Add ACCOUNT as a TermQuery - Add DATE RANGE as Filter 2. Two - Add ACCOUNT as Filer - Add DATE RANGE as NumericRangeQuery I tried caching the filters on both scenarios. I also tried both scenarios by passing the query as a ConstantScoreQuery as well. I got the best result (about 4x faster) by using a cached filter for the DATE RANGE and leaving the ACCOUNT as a TermQuery. I think I'm happy with this approach. However, the security risk Uwe mentioned when using ACCOUNT as a Query makes me nervous. Any suggestions? As for document distribution, the ACCOUNTS have a similar distribution of documents. Also, I still would like to try the multi index approach, but not sure about the memory, file handle burden of it (having potentially thousands of reades/writers/searchers) open at the same time. I use two processes one as indexer and one for search with the same underlying FSDirectory. As for search, I use writer.getReader().reopen within a SearchManager as suggested by Lucene in Action. On 24 October 2010 10:27, Paul Elschot <[email protected]> wrote: > Op zondag 24 oktober 2010 00:18:48 schreef Khash Sajadi: > > My index contains documents for different users. Each document has the > user > > id as a field on it. > > > > There are about 500 different users with 3 million documents. > > > > Currently I'm calling Search with the query (parsed from user) > > and FieldCacheTermsFilter for the user id. > > > > It works but the performance is not great. > > > > Ideally, I would like to perform the search only on the documents that > are > > relevant, this should make it much faster. However, it seems > Search(Query, > > Filter) runs the query first and then applies the filter. > > > > Is there a way to improve this? (i.e. run the query only on a subset of > > documents) > > > > Thanks > > > > When running the query with the filter, the query is run at the same time > as the filter. Initially and after each matching document, the filter is > assumed to > be cheaper to execute and its first or next matching document is > determined. > Then the query and the filter are repeatedly advanced to each other's next > matching > document until they are at the same document (ie. there is a match), > similar to > a boolean query with two required clauses. > The java code doing this is in the private method > IndexSearcher.searchWithFilter(). > > It could be that filling the field cache is the performance problem. > How is the performance when this search call with the FieldCacheTermsFilter > is repeated? > > Also, for a single indexed term to be used as a filter (the user id in this > case) > there may be no need for a cache, a QueryWrapperFilter around the TermQuery > might suffice. > > Regards, > Paul Elschot > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
