On Sun, Dec 31, 2000 at 03:54:08PM -0500, Chris Anderson wrote:
>
> On Sun, Dec 31, 2000, Adam Langley wrote:
> > On Sun, Dec 31, 2000 at 02:02:31AM -0800, Aaron Voisine wrote:
> > > Here's my idea for keyword searches. I volunteer to implement it.
> >
> > That's very nice of you, and please don't be put off - but you
> > utterly misunderstand much of Freenet.
> >
> > > Each node keeps an index of the ksk's it has in it's data store. When
> it =
> > > receives a Request.Search
> >
> > KSKs are binary by the time they hit a node.
> >
>
> That is a big drawback :)
>
> Clients have to insert keyword indices for keyword searches to work
> (doh!). The two methods mentioned in David Wentzlaff's (w/Likuo Lin &
> Alexander Yip) searching paper at
>
> http://cag.lcs.mit.edu/~wentzlaf/classes/6.899/project/public/doc/
>
This is an interesting paper. Are LIFs evil or just hard? They seem to be similar to
updating, would they be implemented together?
> are Enumeration and Summaries. There is also mentioning of Lightweight
> Indirect Files (LIFs) in that paper, but I think the Summaries method is a
> client side implementation of LIFs. The problem that the Summaries method
> tries to solve is how to create an inverted index of keywords to keys.
> The solution presented was to create an inverted index daily, so that only
> the newest keywords are indexed since the last summary was created. I
> think this is a reasonable property, since the most recent files are often
> the most relevant and also have the highest probability of being cached by
> Freenet. Client side indexing implementations of searching have another
> property that is desirable, popular keywords will have indices that are
> the most up to date and the most widely cached.
>
> I think I would combine the notions of Enumeration and Summaries into a
> single index that is updated like an Enumeration but ages like a Summary.
> Lets say a document has keywords mp3, metallica. Updating these keywords
> would be performed like the Enumeration method, a binary search on the
> keyword (search for mp3.0, mp3.100, mp3.200...). For each collision on
> the keyword, keep a list of unique keys sorted by a timestamp. When an
> unused Enumeration is found, insert the top N keywords (sorted by
> timestamp) along with the new keyword. So, every Enumeration will contain
> an estimate of the N newest references:
_______________________________________________
Freenet-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/mailman/listinfo/freenet-dev