I want to reiterate a comment I made earlier, with regard to storing things into the Freenet under a "searchkey" like mp3. This is not going to work, because too many documents will use that keyword, and they will all try to go onto that one node (even if the "documents" are just index or metadata entries there are too many).
I like the idea of storing documents under keywords, and doing "searches" by specifying a primary keyword (that is used for routing), with secondary keywords which must also be matched in the node before the doc is returned. I also like the idea of returning multiple hits in the form of index/metadata records which then point at a CHK or other unambiguous specifier to pull the actual data. But if we do this, the primary keyword can't be a common word like "mp3". It would be OK as a secondary keyword but as a primary it would be too common. You could search with primary keyword = "backstreet boys" and secondary = "mp3", and that wouldn't have so many collisions. It would still be possible that storing an index entry for every single document on the net that is about the backstreet boys would overload the node. I therefore proposed that it would be useful to have the primary key (the searchkey for routing) be a combination of keywords in some canonical ordering (like alphabetical, separated by slashes; or whatever other convention we adopt). So the primary keyword could be "backstreet boys/mp3". This would cause even less congestion onto a small number of nodes, and would only find mp3s by the backstreet boys. What I proposed was that nodes would simply refuse to accept data if they already have too many entries with the identical primary searchkey as the incoming. So an attempt to insert an entry under searchkey of "mp3" would simply (and perhaps silently) fail since the node would already have too many such entries. Using "backstreet boys/mp3" would be more likely to succeed but even that might be too much crowding for some nodes. Using "backstreet boys/i want it that way/mp3" would be much less likely to collide. There would therefore be a disadvantage to using primary searchkeys that are common. They would be unlikely to be retained on insert, and therefore unlikely to return all the possible hits on retrieve. Using more combinations of keywords as primary searchkeys would make the data more likely to be available on Freenet, but at the cost of requiring users to specify several keywords in order to find the data. Hal _______________________________________________ Freenet-dev mailing list Freenet-dev at lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/freenet-dev
