Re: Making document numbers persistent

Kay Roepke Sat, 13 Jan 2007 17:15:07 -0800

Hi Karl!

On 13. Jan 2007, at 20:12 , karl wettin wrote:

13 jan 2007 kl. 19.14 skrev Kay Roepke:
All of the users (documents we index) are "connected" to certainother users,in a network fashion. We must be able to restrict the query (orfilter it aftersearching the complete index) to certain "levels ofconnectedness", i.e. youcan search within say three hops of yourself. We compute a listof user idswhich are in the set of the applicable "contacts". Thisinformation aboutconnection cannot be stored in the index, as it is changing oftenand is
expensive to compute in advance.
Given you don't have too many users, you could do it the other wayaround, storing
the distance to all users from each document.

Unfortunately (or fortunately, depending how you look at it ;)), wedo have a lot of users.

Precalculation isn't an option with the rate of change in that database.

But the bottom line really is that Lucene is not designed for thissort of thing.

Yeah, exactly what I have been muttering for the last few days now.NeverthelessLucene is ideal for all the other types of searches we want toprovide. This"network search" option is the only thing that really gets in theway, and I'd hate

to have to abandon Lucene for that reason alone.

If I was you, I would make a filter that navigates an in heapobject graph of allusers and their connections using a breadth first (or perhaps evenA*). I'mcertain it will turn out to be much simpler to maintain, andprobably severalthousand times faster than implementing such a feature in theLucene storage.

An option I have considered previously, too. I haven't had time toimplement it, though.Anyway, I do have a list of the connected users from a differentsource, I don't expectto get that information out of Lucene. Sorry if I haven't made clearin my earlier post.My problem is really only to filter the search results to justinclude the previously

calculated documents.

I would essentially have the same problem with a in-memory graph: Icannot be sureof the Lucene document ids those users will have, so I would have tolook those up

- exactly as I do now. The bottleneck would remain the same :(

Thanks,

Kay
--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Making document numbers persistent

Reply via email to