[freenet-devl] Freenet - The Next Generation : Fuzzy Searching

Ian Clarke Sat, 2 Jun 2001 22:47:01 -0700

It seems to have been a long time since we had some more long-term
discussions about Freenet, and features for the future.  While it is
good that we have focussed on more short-term issues (hell, we have
*users* now!), I think it is about time that a few people started
concentrating on longer-term more ambitious stuff too.

It is the popular conception that fuzzy searching in Freenet would not
be possible, yet early-on we discussed a proposal which could achieve
just this in an efficient and elegant manner.

To refresh our memories - here it is:

Currently a key in Freenet consists, essentially, of a string of text.
We define a "closeness" function between two keys, so that given three
keys, A, B, and C, we can determine which of A or B is closer to C. To
do this we have chosen a lexographic comparison, so that "aardvark" is
closer to "apple" than "zebra" is. It was apparent, even when writing
my original paper describing the Freenet architecture, that much more
complex keys could be used with correspondingly more sophisticated
closeness operations.

So how does this help up with Fuzzy searching? Well consider we defined
a new key-type, called a MetadataKey, which rather than just a single
string of text, it consisted of a number of key-data pairs, such as:

"artist" => "tori amos"
"album" => "little earthquakes"
"song" => "winter"
"year" => "1988"

Now, lets say that we wanted to search for an mp3 which was stored under
such a key. We could define a search like this:

("artist" string= "tori amos") AND
("album" contains "litt") AND
("song" contains "w") AND
("year" lessThan "2000")

A node receives this search, and compares it to each of the MetadataKeys
in its datastore. It uses fuzzy logic to come up with a value between 0
and 1 for how closely each MetadataKey matches this search (I have
thought this out in more detail but for brevity won't explain here,
do a web search for "fuzzy logic", and "Levenshtein distance"), a
perfect match being 1. The search request is then forwarded to the
referece associated with the closest match. Once the HTL runs out, a
SearchResponse message is sent which contains, at any given time, the
top, say, 10 matches for the query, along with the CHK of the data they
refer to. Each node updates this as they pass the search request back to
the requester. The requester can then choose which of these they wish
to request using conventional Freenet messages.

Of course, we do lose some nice features which normal Freenet messages
have here - since node operators can now read the metadata in their
datastores, it is no-longer encrypted. We also lose the randomization
effect that we get from hashing keys. However, given that this is only
search information, is much smaller than most data and therefore
node-operator censorship is less important since it can be more widely
distributed, I don't think that this is a serious problem. Also, this
mechanism will augment the existing Freenet, not replace anything (the
datastore for the MetadataKeys can be kept separate).

Implementation of this will dramatically reduce the functional
difference between Freenet and something like Google, Napster, or
Gnutella, while retaining Freenet's scalability.

I plan to get started on implementation of this, with a goal to
integrate it into Freenet 0.6.

Ian.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20010602/0a1d6870/attachment.pgp>

[freenet-devl] Freenet - The Next Generation : Fuzzy Searching

Reply via email to