[freenet-devl] Large insertions to 0.4

Patrick Oscar Boykin Wed, 10 Oct 2001 03:02:24 -0700

On Tue, Oct 09, 2001 at 08:37:53AM -0700, Ian Clarke wrote:
> On Mon, Oct 08, 2001 at 09:29:15PM -0500, thelema wrote:
> > > Inserting large files is pointless anyway until we get redundant
> > > splitfiles (see GJ's freesite for why).
> > > 
> > Weren't you one of the people insisting that we wouldn't need redundancy
> > for splitfiles?
> 
> I changed my mind - would you prefer that I stick to a viewpoint that I
> have discovered is incorrect?
> 
> > Why does having something published in freenet change your mind about
> > this topic?
> 
> It is nothing to do with having something published in Freenet, it is
> the simple mathematics of it.  See GJs freesite.


<snip/>

> Ian.

I don't know if this has been mentioned before (certainly) but there
could be a security benefit for redundancy in splitfiles.  If the files
were split according to a secret sharing algorithm:

see: http://www.best.com/~szabo/secret.html (and many others)

Then a node operator *COULD NOT* know what the content meant unless he
had enough parts of the file.

In this case, it would not be neccesary to encrypt datastores.

The idea, for those few unfamiliar, is that the file is split into n
parts, and with any k parts, you can reconstruct the original file.
With any k-1 parts, you get *nothing*.  This is information security,
not computation security (think one-time-pad vs. RSA).  If you have k-1
parts and all the computing power in the universe, you still don't get
the original file.

This could be useful for freenet.  Those that have said that splitfiles
will not need redundancy, are incorrect I believe.  I have not looked at
gj's site in 0.4 (don't have a 0.4 node running, sorry) but I am sure
his argument goes like this:

if the probability of not finding a file is p, and files are split into
constant sized chunks of maximum size B, then a file of size S will be
split into N = S/B chunks.  The probability of downloading all of these
files sucessfully is: (1-p)^N , which is ((1-p)^1/B)^S .  To take an
example, blocksize = 2 MB, and S = 650 MB (an iso image) then N = 325.
If freenet requests fail with probability 1%, the probability of getting
the entire file is now .99^325 = 0.0038 or 0.38%.  Not bloody likely.

If on the other hand, some coding is done (perhaps the secret sharing or
a better scheme) freenet may be viewed as a erasure channel.  As anyone
that has studied channel coding can tell you, the capacity of such a
channel is 1-e where e is the probability of information being erased.
No exponentially bad scaling is neccessary.  

Saying redundancy is not needed is like telling cell phone makers not to
use error correcting codes which "waste" capacity.  Instead they should just
make the data transmission perfect.  Guess, what: it's not going to
happen, and freenet is not going to deliver files with probability 1
either.  On the other hand, for a modest cost in bits (in the limit
you must store (1/(1-p)) * (filesize) which is 1+p for small p) you
*CAN* get perfect reliability (or exponentially close to it).  (This is
Shannon's Coding Theorem).

To sum it up, in the above example, with a freenet with 99% success
rate, and 2MB splitfiles, an ISO image can be successfully downloaded
with probability 0.38% (yes, that's less than 1% not 38%), on the other
hand, using an erasure code, one could make the file slightly more than
1% bigger, and have a probability exponentially close to 1 of
successfully downloading the file.  Which do you think is best?

Oscar.
-- 
boykin at pobox.com        http://pobox.com/~boykin        ICQ: 5118680
Key fingerprint = 159A FA02 DF12 E72F B68F  5B2D C368 3BCA 36D7 CF28
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20011010/a4e3b027/attachment.pgp>

[freenet-devl] Large insertions to 0.4

Reply via email to