On Fri, 15 Apr 2005, C. Scott Ananian wrote:
> Why are blobs per-file? [After all, Linus insists that files are an
> illusion.] Why not just have 'chunks', and assemble *these*
> into blobs (read, 'files')? A good chunk size would fit evenly into some
> number of disk blocks (no wasted space!).
I actually considered that. I ended up not doing it, because it's not
obvious how to "block" things up (and even more so because while I like
the notion, it flies in the face of the other issues I had: performance
The problem with chunking is:
- it complicates a lot of the routines. Things like "is this file
unchanged" suddenly become "is this file still the same set of chunks",
which is just a _lot_ more code and a lot more likely to have bugs.
- you have to find a blocking factor. I thought of just going it fixed
chunks, and that just doesn't help at all.
- we already have wasted space due to the low-level filesystem (as
opposed to "git") usually being block-based, which means that space
utilization for small objects tends to suck. So you really want to
prefer objects that are several kB (compressed), and a small block just
wastes tons of space.
- there _is_ a natural blocking factor already. That's what a file
boundary really is within the project, and finding any other is really
So I'm personally 100% sure that it's not worth it. But I'm not opposed to
the _concept_: it makes total sense in the "filesystem" view, and is 100%
equivalent to having an inode with pointers to blocks. I just don't think
the concept plays out well in reality.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html