Re: Making document numbers persistent

Kay Roepke Sat, 13 Jan 2007 17:05:28 -0800

Hi Erick!

On 13. Jan 2007, at 19:54 , Erick Erickson wrote:

Before going off into modifying things, could you expand a bit onhow you
query to build up the filter? Perhaps providing a code snippet?

We are passing in our unique ids from our database which we have totranslateto lucene document ids. This is done by an API (our own API) call,because themain application isn't written in Java. Lucene will function as aremote service

for the other application servers.

Just to be sure we're talking about the same thing, when you sayfilter, areyou talking about Lucene filters? I'm assuming you are, in whichcase thereis probably wisdom on the list (although I won't provide very much<G>).building up a Lucene filter with termenum/termdocs has been quitefast in myexperience, but I don't know if my experience has any relevance toyour
situation....

Yes, I was talking about Lucene filters. Here's what we do currently(pretty much

standard, if I'm correct):

public class IdQueryFilter extends Filter {

    Collection users;

    public IdQueryFilter(Collection users) {
        this.users = users;
    }

    public BitSet bits(IndexReader index) throws IOException {
        BitSet result = new BitSet();
        Iterator it = users.iterator();
        while (it.hasNext()) {

Term term = new Term( "id", new Long(((User)it.next()).id).toString());

            TermDocs terms = ((IndexReader)index).termDocs( term );
            if (terms.next()) {
                result.set(terms.doc());
            }
            terms = null; term = null;
        }
        return result;
    }
}

This can take up to 30sec for a large (~500.000 elements) collectionsof users and it

it the thing I'm currently trying to solve.

I can handle situations where this can take long once, since I'mreally asking somethingthat Lucene isn't designed for, but the culprit is that I can'treally cache the resultingbitset. I can cache it on one of the Lucene servers, but can't shareit among the rest ofthe servers (we will eventually have way more than one for scalabilty/reliability reasons).We cannot afford to calculate these bitsets on all servers (think ofa repeated search, orpaging, when you cannot make sure that you will hit the same Luceneapplication to do thesearch - you might end up on a different server that hasn't seen arequest before).

I hope this makes it more clear of what I'm up against. I'm notrunning around to change thingsfor the change's sake. If I can get around it, fine. If not, I candeal :)


Thanks,

Kay
--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Making document numbers persistent

Reply via email to