On Fri, 15 Apr 2005, Linus Torvalds wrote:

The problem with chunking is:
- it complicates a lot of the routines. Things like "is this file
  unchanged" suddenly become "is this file still the same set of chunks",
  which is just a _lot_ more code and a lot more likely to have bugs.

The blob still has the same hash; therefore the file is still the same.
Nothing looks inside blobs; they just want either the hash or the full contents (if I understand the algorithms correctly).
I agree it's more code, but I think it can be nicely layered.


- you have to find a blocking factor. I thought of just going it fixed
  chunks, and that just doesn't help at all.

rsync uses a fixed chunk size, but this chunk can start at any offset (ie, not constrained to fixed boundaries). This means that adding a single line to the file works like you'd expect, even though all the chunk boundaries change. [I think this is what you're talking about.]


- we already have wasted space due to the low-level filesystem (as
  opposed to "git") usually being block-based, which means that space
  utilization for small objects tends to suck. So you really want to
  prefer objects that are several kB (compressed), and a small block just
  wastes tons of space.

Not on (say) reiserfs, and not over the network. I'm proposing (at the moment) easy conversion from chunked to unchunked disk representation,
so that you can leave things unchunked if (for example) you know you're running ext2 with a large block size.


- there _is_ a natural blocking factor already. That's what a file
  boundary really is within the project, and finding any other is really
  quite hard.

Well, yes, it may be nontrivial. But 'quite hard' depends on your perspective, I guess. Given a cache of existing chunks, it's just a few table lookups. =)


So I'm personally 100% sure that it's not worth it. But I'm not opposed to
the _concept_: it makes total sense in the "filesystem" view, and is 100%
equivalent to having an inode with pointers to blocks. I just don't think
the concept plays out well in reality.

So I guess I'll have to implement this and find out, won't I? =) --scott

AMLASH overthrow SDI Suharto HBDRILL SMOTH SUMAC SYNCARP kibo Blair Diplomat Kojarena CIA cracking counter-intelligence CABOUNCE anthrax
( http://cscott.net/ )
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to