OK ... here is a proposal for searching, metadata, encryption, CHK, KHK all 
rolled up into one ... please bear
with me here. It is long and I try to go over the whole process even if things 
haven't changed. I will explain
an insert and the all the possible requests and then the consequences/benefits 
of this method.

The Insert:

Data is encrypted with the plain key and then some header data is included with 
the data (separate from the
message header). This data header would likely include the method on encryption 
and any other things that the
author felt was necessary to include ... the details here are not important 
(*flexibility*).

Nothing new so far ...

Plain Key is hashed to form KHK and then the data is routed using the KHK when 
it is inserted as usual.
Same as what occurs now ... exactly.

When the nodes store the data, however, they do something a bit different. 
Before they save the data (or check
for a collison), they make a CHK of the data, save the data as normal and index 
it under hash(KHK:CHK) (i.e. the
hash of a concatenation of the KHK and the CHK).  This may seem kind of silly 
but it will make more sense on the
request side of things. For future ease of processing the CHK could be stored 
along with the data and the
H(KHK:CHK).

The Request:

the specific request:
Someone is interested in a particular file that they found the reference for 
from a trusted source (be that
within or outside of freenet). This reference would include the Plain Text key 
and the CHK (aka checksum). So
their clients makes up a Data Request and sends it off. This data request is 
routed using the KHK but also
contains the CHK in its message header. This data request would be smartly 
routed to the data.

At the node, the node sees that the data request has both the KHK and the CHK 
included in it. It hashes those
two together and then sees if it has that file in its inventory. (*content 
confirmation*) If it does then it
returns it and if it doesn't then it forwards on the data request untouched 
except for the HTL decrement.

Once again not much new.

the general request (aka search):
Someone is interested in a general subject (say mp3s) and maybe even a specific 
topic (say a particular music
group). They make an attempt at guessing a key. This may just be a keyword. 
Since they don't know exactly what
they are looking for they certainly don't have a CHK so they send off a data 
request without one. This request
in smartly routed using the KHK.

At the node, the node sees that there is not a CHK included so it knows that it 
is not looking for a specific
file. So it takes the supplied KHK and hashes it with the CHKs from its store 
one at a time and checks for a
match. Each time it finds a match it will take the metadata header from the 
data in question and compile a list
of metadata of data that has a matching plain key.  In the list it will include 
the CHKs of that data as well so
that the user can specificly request that data after viewing the metadata. Each 
time it finds a match it will
decrease the HTL until it is zero. If it exhausts its search of the store 
before HTL reaches zero, it forwards
the request to the next node including the CHKs found on its store (but not the 
metadata list). This list can be
locally stored until the next node responds back with its data after the search 
was exhausted or the node can
send its list back along the chain letting the nodes know the the HTL hasn't 
reached zero yet.

At the next node, the search continues but this node knows not to include 
previously found CHKs in its list. It
will only decrement the HTL for every new CHK it finds and it will compile its 
own list like the first node.
This goes on until the HTL is zero. This last message gets sent back, either 
collecting the other lists as it
goes along, or letting the nodes know that the HTL was reached in making this 
list.

The client receives all this metadata under the same KHK and displays it to the 
user to allow them to narrow
their search or extend the HTL (maybe even including the rejected CHKs in a new 
general data request like the
nodes did when they forwarded the request) so that new matches can be returned.

If you are with me so far ... thanks for hanging in there ...

None of this is terribly new and it all have been discussed in the mailing list 
but I figured that it would be
useful to bring some of these concepts together into an infrustructure that 
would solve some of the shortcomings
of the current freenet.

The Consequences:
- nothing is stopping someone from inserting a message under a particular KHK 
with nothing but metadata. It
could be references to new versions, critiques of particular CHK data, etc.

- this can be used as a very general search mechanism where people can insert 
references to their data under
keywords relavent to their insert.

- new versions can be inserted under the same KHK which could indetify 
themselfs as such in the metadata

- many different client specific metadata schemes could be implemented and the 
clients that recognize their
particular format would have enhanced functionality (author verification, 
superwhammy encryption, private
messages, bulletin boards, etc.)

- the concept of guessable keys is preserved

- the concept of targeted CHK requests are introduced while remaining 
compatible with KHKs

- dumb data doesn't get voted for since retreiving the metadata in a general 
request doesn't count as a vote for
the data. Only specific requests vote for the data.

- valid metadata can get voted for my specifically requesting it (rather than 
it being part of a general
request)

- the metadata directly attached to the data dies with the data. All of the 
other descriptive pure metadata
files will likely not stick around after the main data file has disappeared 
since retreiving them with a
specific data request will likely never happen.

- you could have "sentry nodes" scattered about the freenet checking the 
validity of the CHK in data replies
since it knows what was requested and what was sent back. cancer nodes could be 
weeded out (or at least
discouraged) this way.

- since the CHK is that of the data+metadata header, specific requests will not 
give away which file the user is
requesting to a snooper. the metadata header could contain some random blob 
from the orginal insert to change
the CHK. so if someone was snooping for the insert or request of a specific 
(prosecutable) file, they would have
a very very hard time guessing what the CHK would be for it beforehand. Tack on 
the fact that it could be
encrypted in any fashion and you made it pretty much impossible.

Comments?

Mike





_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to