On Wed, 20 Apr 2005, Linus Torvalds wrote:

What's the disk usage results? I'm on ext3, for example, which means that
even small files invariably take up 4.125kB on disk (with the inode).

Even uncompressed, most source files tend to be small. Compressed, I'm
seeing the median blob size being ~1.6kB in my trivial checks. That's
blobs only, btw.

I'm working on it. The format was chosen so that blobs under 1 block long *stay* 1 block long; i.e. there's no 'chunk plus index file' overhead.
So the chunking should only kick in on multiple-block files.
I hacked 'convert-cache' to do the conversion and it's running out of
memory on linux-2.6.git, however --- I found a few memory leaks in your code =) but I certainly seem to be missing a big one still (maybe it's in my code!).

When I get this working properly, my plan is to do a number of runs over the linux-2.6 archive trying out various combinations of chunking parameters. I *will* be watching both 'real' disk usage (bunged up to block boundaries) and 'ideal' disk usage (on a reiserfs-type system).
The goal is to improve both, but if I can improve 'ideal' usage significantly with a minimal penalty in 'real' usage then I would argue it's still worth doing, since that will improve network times.

The handshaking penalties you mention are significant, but that's why rsync uses a pipelined approach. The 'upstream' part of your full-duplex pipe is 'free' while you've got bits clogging your 'downstream' pipe. The wonders of full-duplex...

Anyway, "numbers talk, etc".  I'm working on them.

LIONIZER LCPANES shortwave MKSEARCH ESGAIN Saddam Hussein Rijndael WASHTUB Morwenstow ZPSEMANTIC SKIMMER cryptographic FJHOPEFUL assassination
( http://cscott.net/ )
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to