Hi Erick!

On 13. Jan 2007, at 19:54 , Erick Erickson wrote:

Before going off into modifying things, could you expand a bit on how you
query to build up the filter? Perhaps providing a code snippet?

We are passing in our unique ids from our database which we have to translate to lucene document ids. This is done by an API (our own API) call, because the main application isn't written in Java. Lucene will function as a remote service
for the other application servers.

Just to be sure we're talking about the same thing, when you say filter, are you talking about Lucene filters? I'm assuming you are, in which case there is probably wisdom on the list (although I won't provide very much <G>). building up a Lucene filter with termenum/termdocs has been quite fast in my experience, but I don't know if my experience has any relevance to your
situation....

Yes, I was talking about Lucene filters. Here's what we do currently (pretty much
standard, if I'm correct):

public class IdQueryFilter extends Filter {

    Collection users;

    public IdQueryFilter(Collection users) {
        this.users = users;
    }

    public BitSet bits(IndexReader index) throws IOException {
        BitSet result = new BitSet();
        Iterator it = users.iterator();
        while (it.hasNext()) {
Term term = new Term( "id", new Long(((User)it.next ()).id).toString());
            TermDocs terms = ((IndexReader)index).termDocs( term );
            if (terms.next()) {
                result.set(terms.doc());
            }
            terms = null; term = null;
        }
        return result;
    }
}

This can take up to 30sec for a large (~500.000 elements) collections of users and it
it the thing I'm currently trying to solve.
I can handle situations where this can take long once, since I'm really asking something that Lucene isn't designed for, but the culprit is that I can't really cache the resulting bitset. I can cache it on one of the Lucene servers, but can't share it among the rest of the servers (we will eventually have way more than one for scalabilty/ reliability reasons). We cannot afford to calculate these bitsets on all servers (think of a repeated search, or paging, when you cannot make sure that you will hit the same Lucene application to do the search - you might end up on a different server that hasn't seen a request before).

I hope this makes it more clear of what I'm up against. I'm not running around to change things for the change's sake. If I can get around it, fine. If not, I can deal :)

Thanks,

Kay
--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to