Re: space compression (again)

Linus Torvalds Fri, 15 Apr 2005 11:33:57 -0700


On Fri, 15 Apr 2005, C. Scott Ananian wrote:
> 
> Why are blobs per-file?  [After all, Linus insists that files are an 
> illusion.]  Why not just have 'chunks', and assemble *these* 
> into blobs (read, 'files')?  A good chunk size would fit evenly into some 
> number of disk blocks (no wasted space!).


I actually considered that. I ended up not doing it, because it's not 
obvious how to "block" things up (and even more so because while I like 
the notion, it flies in the face of the other issues I had: performance 
and simplicity).

The problem with chunking is:
 - it complicates a lot of the routines. Things like "is this file 
   unchanged" suddenly become "is this file still the same set of chunks",
   which is just a _lot_ more code and a lot more likely to have bugs.
 - you have to find a blocking factor. I thought of just going it fixed 
   chunks, and that just doesn't help at all. 
 - we already have wasted space due to the low-level filesystem (as 
   opposed to "git") usually being block-based, which means that space 
   utilization for small objects tends to suck. So you really want to 
   prefer objects that are several kB (compressed), and a small block just
   wastes tons of space.
 - there _is_ a natural blocking factor already. That's what a file 
   boundary really is within the project, and finding any other is really 
   quite hard.

So I'm personally 100% sure that it's not worth it. But I'm not opposed to
the _concept_: it makes total sense in the "filesystem" view, and is 100%
equivalent to having an inode with pointers to blocks. I just don't think 
the concept plays out well in reality.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: space compression (again)

Reply via email to