OK ... here is a proposal for searching, metadata, encryption, CHK, KHK all rolled up into one ... please bear with me here. It is long and I try to go over the whole process even if things haven't changed. I will explain an insert and the all the possible requests and then the consequences/benefits of this method.
The Insert: Data is encrypted with the plain key and then some header data is included with the data (separate from the message header). This data header would likely include the method on encryption and any other things that the author felt was necessary to include ... the details here are not important (*flexibility*). Nothing new so far ... Plain Key is hashed to form KHK and then the data is routed using the KHK when it is inserted as usual. Same as what occurs now ... exactly. When the nodes store the data, however, they do something a bit different. Before they save the data (or check for a collison), they make a CHK of the data, save the data as normal and index it under hash(KHK:CHK) (i.e. the hash of a concatenation of the KHK and the CHK). This may seem kind of silly but it will make more sense on the request side of things. For future ease of processing the CHK could be stored along with the data and the H(KHK:CHK). The Request: the specific request: Someone is interested in a particular file that they found the reference for from a trusted source (be that within or outside of freenet). This reference would include the Plain Text key and the CHK (aka checksum). So their clients makes up a Data Request and sends it off. This data request is routed using the KHK but also contains the CHK in its message header. This data request would be smartly routed to the data. At the node, the node sees that the data request has both the KHK and the CHK included in it. It hashes those two together and then sees if it has that file in its inventory. (*content confirmation*) If it does then it returns it and if it doesn't then it forwards on the data request untouched except for the HTL decrement. Once again not much new. the general request (aka search): Someone is interested in a general subject (say mp3s) and maybe even a specific topic (say a particular music group). They make an attempt at guessing a key. This may just be a keyword. Since they don't know exactly what they are looking for they certainly don't have a CHK so they send off a data request without one. This request in smartly routed using the KHK. At the node, the node sees that there is not a CHK included so it knows that it is not looking for a specific file. So it takes the supplied KHK and hashes it with the CHKs from its store one at a time and checks for a match. Each time it finds a match it will take the metadata header from the data in question and compile a list of metadata of data that has a matching plain key. In the list it will include the CHKs of that data as well so that the user can specificly request that data after viewing the metadata. Each time it finds a match it will decrease the HTL until it is zero. If it exhausts its search of the store before HTL reaches zero, it forwards the request to the next node including the CHKs found on its store (but not the metadata list). This list can be locally stored until the next node responds back with its data after the search was exhausted or the node can send its list back along the chain letting the nodes know the the HTL hasn't reached zero yet. At the next node, the search continues but this node knows not to include previously found CHKs in its list. It will only decrement the HTL for every new CHK it finds and it will compile its own list like the first node. This goes on until the HTL is zero. This last message gets sent back, either collecting the other lists as it goes along, or letting the nodes know that the HTL was reached in making this list. The client receives all this metadata under the same KHK and displays it to the user to allow them to narrow their search or extend the HTL (maybe even including the rejected CHKs in a new general data request like the nodes did when they forwarded the request) so that new matches can be returned. If you are with me so far ... thanks for hanging in there ... None of this is terribly new and it all have been discussed in the mailing list but I figured that it would be useful to bring some of these concepts together into an infrustructure that would solve some of the shortcomings of the current freenet. The Consequences: - nothing is stopping someone from inserting a message under a particular KHK with nothing but metadata. It could be references to new versions, critiques of particular CHK data, etc. - this can be used as a very general search mechanism where people can insert references to their data under keywords relavent to their insert. - new versions can be inserted under the same KHK which could indetify themselfs as such in the metadata - many different client specific metadata schemes could be implemented and the clients that recognize their particular format would have enhanced functionality (author verification, superwhammy encryption, private messages, bulletin boards, etc.) - the concept of guessable keys is preserved - the concept of targeted CHK requests are introduced while remaining compatible with KHKs - dumb data doesn't get voted for since retreiving the metadata in a general request doesn't count as a vote for the data. Only specific requests vote for the data. - valid metadata can get voted for my specifically requesting it (rather than it being part of a general request) - the metadata directly attached to the data dies with the data. All of the other descriptive pure metadata files will likely not stick around after the main data file has disappeared since retreiving them with a specific data request will likely never happen. - you could have "sentry nodes" scattered about the freenet checking the validity of the CHK in data replies since it knows what was requested and what was sent back. cancer nodes could be weeded out (or at least discouraged) this way. - since the CHK is that of the data+metadata header, specific requests will not give away which file the user is requesting to a snooper. the metadata header could contain some random blob from the orginal insert to change the CHK. so if someone was snooping for the insert or request of a specific (prosecutable) file, they would have a very very hard time guessing what the CHK would be for it beforehand. Tack on the fact that it could be encrypted in any fashion and you made it pretty much impossible. Comments? Mike _______________________________________________ Freenet-dev mailing list Freenet-dev at lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/freenet-dev
