Venti is just a block storage server.
It accepts blocks ranging in size from 0 bytes to 56kB.  
The application can use whatever block size it likes. 
8k is typical.  When I back up file systems I use the 
underlying file system block size.

The current Venti server (in Plan 9) does check when you
write a block, if it hashes to an existing block, that the two
are the same.  My newer Venti server (in Plan 9 from User Space)
does not do this, which is faster.

All this discussion about things more likely than two
random blocks having the same hash is amusing, but
there is a serious point no one has brought up.  
All the math depends on blocks chosen randomly.  An adversary
might actually come up with two blocks with the same
hash, not by random search but by being very clever.

Some researchers in China recently claimed to have 
a program that generate two different blocks with the same
MD5 hash (and I think there is one for SHA1 too, that takes
longer).  I ran the MD5 program for a while on my laptop but
it did not finish.  Maybe I just got unlucky (there is still a
little randomness).  It was supposed to be able to finish
in something like 45 minutes and I ran it overnight.

I don't know whether their approach generates two 
blocks of the same length.  I do know that if they are
trying to match a pre-existing hash they do so by adding
padding in the form of some kind of comment.  For example,
they could start with a PDF that said "Buy 1M shares of X",
change the Buy to Sell, and then insert some comments in
the PDF to make the hash of the new document the same
as the hash of the original document and thus the old
signature would work for the new document.  This is bad
and will be worse as computers get faster, and various 
people are worried about how to switch to SHA256.

I am not worried.  If there is some adversary using
your Venti system, there are simpler attacks they could
use to render it inoperable (like fill it up).  I am happy to
assume that the Venti clients are playing nicely.
If, at some point in the future, it was really a problem,
the types that Venti stores with each block would allow
a Venti server to be "transcoded" into a different hash
function pretty easily.  (Of course, the clients would have
to be informed of the new hashes of their root blocks.)

Russ

Reply via email to