On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote: Hi, > A merkle-tree (which I think you initially pointed me at) makes the hash > of the internal nodes be a hash of the chunk's hashes; ie not a straight > content hash. This is roughly what my current implementation does, but > I would like to identify each subtree with the hash of the > *(expanded) contents of that subtree* (ie no explicit reference to > subtree hashes). This makes it interoperable with non-chunked or > differently-chunked representations, in that the top-level hash is *just > the hash of the complete content*, not some hash-of-subtree-hashes. Does > that make more sense?
Yes, thank you. But I would like to argue against this: You can make the representations interoperable if you calculate the hash for the non-chunked representations exactly as if this file is stored chunked but simple do not store it in that way. Of course this is not backward compatible to the monolithic hash and not compatible with a differently chunked representation (but you could store subtrees unchunked if you think your chunks are too small). > The code I posted doesn't demonstrate this very well, but now that Linus > has abandoned the 'hash of compressed content' stuff, my next code posting > should show this more clearly. I think the hash of the treap piece should be calculated from the hash of the prefix and suffix tree and the already calculated hash of the uncompressed data. This makes hashing nearly as cheap as in Linus version which is important because checking whether a given file has identically content as a stored version should be fast. > >If I don't miss anything essential, you can validate > >each treap piece at the moment you get it from the > >network with its SHA1 hash and then proceed with > >downloading the prefix and suffix tree (in parallel > >if you have more than one peer a la bittorrent). > > Yes, I guess this is the detail I was going to abandon. =) > > I viewed the fact that the top-level hash was dependent on the exact chunk > makeup a 'misfeature', because it doesn't allow easy interoperability with > existing non-chunked repos. I thought this as a misfeature too before I realized how many advantages this has. Martin -- One night, when little Giana from Milano was fast asleep, she had a strange dream.
Description: Digital signature