On Fri, Feb 1, 2013 at 5:15 PM, Jeff King <p...@peff.net> wrote:
> The short-sha1 is a clever idea. Looks like it saves us on the order of
> 4MB for linux-2.6 (versus the full 20-byte sha1). Not as big as the
> savings we get from dropping the other 3 sha1's to uint32_t, but still
> not bad.
We could save another 4 bytes per commit by using 3 bytes for storing
.idx offsets. linux-2.6 only has 3M objects. It'll take many years for
big projects to reach 16M objects and need the fourth byte in
> I guess the next steps in iterating on this would be:
> 1. splitting out the refactoring here into separate patches
> 2. squashing the cleaned-up bits into my patch 4/6
> 3. deciding whether this should go into a separate file or as part of
> index v3. Your offsets depend on the .idx file having a sorted sha1
> list. That is not likely to change, but it would still be nice to
> make sure they cannot get out of sync. I'm still curious what the
> performance impact is for mmap-ing N versus N+8MB.
4. Print some cache statistics in "count-objects -v"
>> The length of SHA-1 is chosen to be able to unambiguously identify any
>> cached commits. Full SHA-1 check is done after to catch false
> Just to be clear, these false positives come because the abbreviation is
> unambiguous within the packfile, but we might be looking for a commit
> that is not even in our pack, right?
It may even be ambiguous within the pack, say an octopus (i.e. not
cached) commit that shares the same sha-1 prefix with one of the
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html