Hi Karl!

On 13. Jan 2007, at 20:12 , karl wettin wrote:


13 jan 2007 kl. 19.14 skrev Kay Roepke:

All of the users (documents we index) are "connected" to certain other users, in a network fashion. We must be able to restrict the query (or filter it after searching the complete index) to certain "levels of connectedness", i.e. you can search within say three hops of yourself. We compute a list of user ids which are in the set of the applicable "contacts". This information about connection cannot be stored in the index, as it is changing often and is
expensive to compute in advance.

Given you don't have too many users, you could do it the other way around, storing
the distance to all users from each document.

Unfortunately (or fortunately, depending how you look at it ;)), we do have a lot of users.
Precalculation isn't an option with the rate of change in that database.

But the bottom line really is that Lucene is not designed for this sort of thing.

Yeah, exactly what I have been muttering for the last few days now. Nevertheless Lucene is ideal for all the other types of searches we want to provide. This "network search" option is the only thing that really gets in the way, and I'd hate
to have to abandon Lucene for that reason alone.

If I was you, I would make a filter that navigates an in heap object graph of all users and their connections using a breadth first (or perhaps even A*). I'm certain it will turn out to be much simpler to maintain, and probably several thousand times faster than implementing such a feature in the Lucene storage.

An option I have considered previously, too. I haven't had time to implement it, though. Anyway, I do have a list of the connected users from a different source, I don't expect to get that information out of Lucene. Sorry if I haven't made clear in my earlier post. My problem is really only to filter the search results to just include the previously
calculated documents.
I would essentially have the same problem with a in-memory graph: I cannot be sure of the Lucene document ids those users will have, so I would have to look those up
- exactly as I do now. The bottleneck would remain the same :(

Thanks,

Kay
--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to