Hi Karl!
On 13. Jan 2007, at 20:12 , karl wettin wrote:
13 jan 2007 kl. 19.14 skrev Kay Roepke:
All of the users (documents we index) are "connected" to certain
other users,
in a network fashion. We must be able to restrict the query (or
filter it after
searching the complete index) to certain "levels of
connectedness", i.e. you
can search within say three hops of yourself. We compute a list
of user ids
which are in the set of the applicable "contacts". This
information about
connection cannot be stored in the index, as it is changing often
and is
expensive to compute in advance.
Given you don't have too many users, you could do it the other way
around, storing
the distance to all users from each document.
Unfortunately (or fortunately, depending how you look at it ;)), we
do have a lot of users.
Precalculation isn't an option with the rate of change in that database.
But the bottom line really is that Lucene is not designed for this
sort of thing.
Yeah, exactly what I have been muttering for the last few days now.
Nevertheless
Lucene is ideal for all the other types of searches we want to
provide. This
"network search" option is the only thing that really gets in the
way, and I'd hate
to have to abandon Lucene for that reason alone.
If I was you, I would make a filter that navigates an in heap
object graph of all
users and their connections using a breadth first (or perhaps even
A*). I'm
certain it will turn out to be much simpler to maintain, and
probably several
thousand times faster than implementing such a feature in the
Lucene storage.
An option I have considered previously, too. I haven't had time to
implement it, though.
Anyway, I do have a list of the connected users from a different
source, I don't expect
to get that information out of Lucene. Sorry if I haven't made clear
in my earlier post.
My problem is really only to filter the search results to just
include the previously
calculated documents.
I would essentially have the same problem with a in-memory graph: I
cannot be sure
of the Lucene document ids those users will have, so I would have to
look those up
- exactly as I do now. The bottleneck would remain the same :(
Thanks,
Kay
--
Kay Röpke
http://classdump.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]