Re: space compression (again)

C. Scott Ananian Sat, 16 Apr 2005 08:11:17 -0700

On Sat, 16 Apr 2005, Martin Uecker wrote:

The right thing (TM) is to switch from SHA1 of compressed
content for the complete monolithic file to a merkle hash tree
of the uncompressed content. This would make the hash
independent of the actual storage method (chunked or not).

It would certainly be nice to change to a hash of the uncompressed content, rather than a hash of the compressed content, but it's not strictly necessary, since files are fetched all at once: there's not 'read subrange' operation on blobs.

I assume 'merkle hash tree' is talking about: http://www.open-content.net/specs/draft-jchapweske-thex-02.html ..which is very interesting, but not quite what I was thinking. The merkle hash approach seems to require fixed chunk boundaries. The rsync approach does not use fixed chunk boundaries; this is necessary to ensure good storage reuse for the expected case (ie; inserting a single line at the start or in the middle of the file, which changes all the chunk boundaries).

Further, in the absence of subrange reads on blobs, it's not entirely clear what using a merkle hash would buy you. --scott

WASHTUB supercomputer security Mk 48 justice ODUNIT radar COBRA JANE SSBN 731 BATF KUJUMP SECANT operation class struggle SYNCARP KGB ODACID ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: space compression (again)

Reply via email to