On Tue, Oct 09, 2001 at 08:37:53AM -0700, Ian Clarke wrote: > On Mon, Oct 08, 2001 at 09:29:15PM -0500, thelema wrote: > > > Inserting large files is pointless anyway until we get redundant > > > splitfiles (see GJ's freesite for why). > > > > > Weren't you one of the people insisting that we wouldn't need redundancy > > for splitfiles? > > I changed my mind - would you prefer that I stick to a viewpoint that I > have discovered is incorrect? > > > Why does having something published in freenet change your mind about > > this topic? > > It is nothing to do with having something published in Freenet, it is > the simple mathematics of it. See GJs freesite.
<snip/> > Ian. I don't know if this has been mentioned before (certainly) but there could be a security benefit for redundancy in splitfiles. If the files were split according to a secret sharing algorithm: see: http://www.best.com/~szabo/secret.html (and many others) Then a node operator *COULD NOT* know what the content meant unless he had enough parts of the file. In this case, it would not be neccesary to encrypt datastores. The idea, for those few unfamiliar, is that the file is split into n parts, and with any k parts, you can reconstruct the original file. With any k-1 parts, you get *nothing*. This is information security, not computation security (think one-time-pad vs. RSA). If you have k-1 parts and all the computing power in the universe, you still don't get the original file. This could be useful for freenet. Those that have said that splitfiles will not need redundancy, are incorrect I believe. I have not looked at gj's site in 0.4 (don't have a 0.4 node running, sorry) but I am sure his argument goes like this: if the probability of not finding a file is p, and files are split into constant sized chunks of maximum size B, then a file of size S will be split into N = S/B chunks. The probability of downloading all of these files sucessfully is: (1-p)^N , which is ((1-p)^1/B)^S . To take an example, blocksize = 2 MB, and S = 650 MB (an iso image) then N = 325. If freenet requests fail with probability 1%, the probability of getting the entire file is now .99^325 = 0.0038 or 0.38%. Not bloody likely. If on the other hand, some coding is done (perhaps the secret sharing or a better scheme) freenet may be viewed as a erasure channel. As anyone that has studied channel coding can tell you, the capacity of such a channel is 1-e where e is the probability of information being erased. No exponentially bad scaling is neccessary. Saying redundancy is not needed is like telling cell phone makers not to use error correcting codes which "waste" capacity. Instead they should just make the data transmission perfect. Guess, what: it's not going to happen, and freenet is not going to deliver files with probability 1 either. On the other hand, for a modest cost in bits (in the limit you must store (1/(1-p)) * (filesize) which is 1+p for small p) you *CAN* get perfect reliability (or exponentially close to it). (This is Shannon's Coding Theorem). To sum it up, in the above example, with a freenet with 99% success rate, and 2MB splitfiles, an ISO image can be successfully downloaded with probability 0.38% (yes, that's less than 1% not 38%), on the other hand, using an erasure code, one could make the file slightly more than 1% bigger, and have a probability exponentially close to 1 of successfully downloading the file. Which do you think is best? Oscar. -- boykin at pobox.com http://pobox.com/~boykin ICQ: 5118680 Key fingerprint = 159A FA02 DF12 E72F B68F 5B2D C368 3BCA 36D7 CF28 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20011010/a4e3b027/attachment.pgp>
