Re: combine results from multiple queries & sort

Jamie Wed, 14 Mar 2012 04:40:26 -0700

Li

Many thanks for the tip. I used the searchWithFilter approach and itsworking brilliantly!!


For the benefit of others, the solutions as follows:

TermsFilter idFilter = new TermsFilter();
for (String id : ids) {
        idFilter.addTerm(new Term("uid",id));
}
searcher.search(query,idFilter,tfc);

Regards

Jamie
On 2012/03/14 12:44 PM, Li Li wrote:

it's a very common problem. many of our users(including programmers that
familiar with sql) have the same question.

comparing with sql, all queries in lucene are based on inverted index.
fortunately, when searching, we can providing a Filter.

from source codes of function searchWithFilter
we can see searching is similar to boolean and queries.

I think you can use TermsFilter.
it just iterator through the terms(your ids) and use BitSet to do filter.
if the documents contains any of the words, it's set to 1, otherwise is 0.

I think this implementation is fast enough, it use tii/tis to locate words,
and for each words, it iterate through it's postings by frq file. postings
may be cached by lucene.

If it can't meet your performance needs. you can implement your own
Collector and using your own cache policy(maybe load all this fields into
memory by a hashmap  your ids->document id)
when a query is "id in(1,3,5,)", you construct a Collector. when it
collects docs, you filter unwanted documents

On Wed, Mar 14, 2012 at 4:01 PM, Jamie<[email protected]>  wrote:

Greetings!

First off, I realize Lucene is a search engine and therefore does not
possess many of the features of a database. That being said, I have
encountered a particular use case where I need to lookup potentially
thousands of records in a Lucene index based upon an ID (a String field in
the index). This data also needs be sorted based upon any chosen field in
the index. In pseudo code, this is how its currently done:

String[] ids = { "123aeeff", "34eacc", ...}

results.clear();
StringBuffer lookupQuery = new StringBuffer()
for (int i=0; i<ids.size();i++) {
       lookupQuery.append(ids.get(i))
       lookupQuery.append(" ")
        if ((i+1) % 1024 == 0) {
            search(lookupQuery.toString())
             lookupQuery = new StringBuffer()
       }
  }
if (lookupQuery.length()>0) {
        search(lookupQuery.toString())
}

As you can see, in a loop, Lucene queries are constructed into a maximum
of 1024 terms, for example, consisting of IDS "123aeeff 34eacc ..". After
each query in the loop is constructed, a search is executed and then the
results are combined into a single linkedlist (this is done in the search
function). This works well aside from two outstanding questions:

1. Is executing separate search queries, the best way to lookup
   thousands of records in an index? Is there a more efficient way to
   lookup thousands of records based upon ID?
2. The results are unsorted after they are combined into a single
   linkedlist. What is the best way to sort the combined results based
   upon any chosen field in the lucene index? Is there a way to do that
   would leverage Lucene's inbuilt sort abilities?

Many thanks for your consideration

Jamie



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: combine results from multiple queries & sort

Reply via email to