On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote:

> A merkle-tree (which I think you initially pointed me at) makes the hash 
> of the internal nodes be a hash of the chunk's hashes; ie not a straight 
> content hash.  This is roughly what my current implementation does, but
> I would like to identify each subtree with the hash of the 
> *(expanded) contents of that subtree* (ie no explicit reference to 
> subtree hashes).  This makes it interoperable with non-chunked or 
> differently-chunked representations, in that the top-level hash is *just 
> the hash of the complete content*, not some hash-of-subtree-hashes.  Does 
> that make more sense?

Yes, thank you. But I would like to argue against this:

You can make the representations interoperable
if you calculate the hash for the non-chunked
representations exactly as if this file is stored
chunked but simple do not store it in that way.

Of course this is not backward compatible to the
monolithic hash and not compatible with a differently
chunked representation (but you could store subtrees
unchunked if you think your chunks are too small).

> The code I posted doesn't demonstrate this very well, but now that Linus 
> has abandoned the 'hash of compressed content' stuff, my next code posting 
> should show this more clearly.

I think the hash of the treap piece should be calculated
from the hash of the prefix and suffix tree and the already
calculated hash of the uncompressed data. This makes hashing
nearly as cheap as in Linus version which is important
because checking whether a given file has identically
content as a stored version should be fast.

> >If I don't miss anything essential, you can validate
> >each treap piece at the moment you get it from the
> >network with its SHA1 hash and then proceed with
> >downloading the prefix and suffix tree (in parallel
> >if you have more than one peer a la bittorrent).
> Yes, I guess this is the detail I was going to abandon. =)
> I viewed the fact that the top-level hash was dependent on the exact chunk 
> makeup a 'misfeature', because it doesn't allow easy interoperability with 
> existing non-chunked repos.

I thought this as a misfeature too before I realized how
many advantages this has.


One night, when little Giana from Milano was fast asleep,
she had a strange dream.

Attachment: signature.asc
Description: Digital signature

Reply via email to