On Wed, 20 Apr 2005, C. Scott Ananian wrote:
> Hmm.  Are our index files too large, or is there some other factor?

They _are_ pretty large, but they have to be,

For the kernel, the index file is about 1.6MB. That's 

 - 17,000+ files and filenames
 - stat information for all of them
 - the sha1 for them all

ie for the kernel it averages to 93.5 bytes per file. Which is actually 
pretty dense (just the sha1 and stat information is about half of it, and 
those are required).

> I was considering using a chunked representation for *all* files (not just 
> blobs), which would avoid the original 'trees must reference other trees 
> or they become too large' issue -- and maybe the performance issue you're 
> referring to, as well?

No. The most common index file operation is reading, and that's the one 
that has to be _fast_. And it is - it's a single "mmap" and some parsing.

In fact, writing it is pretty fast too, exactly because the index file is 
totally linear and isn't compressed or anything fancy like that. It's a 
_lot_ faster than the "tree objects", exactly because it doesn't need to 
be as careful.

The main cost of the index file is probably the fact that I add a sha1 
signature of the file into itself to verify that it's ok. The advantage is 
that the signature means that the file is ok, and the parsing of it can be 
much more relaxed. You win some, you lose some.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to