On Thu, Jan 31, 2013 at 06:06:56PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote:
> > Perhaps we could store abbrev sha-1 instead of full sha-1. Nice
> > space/time trade-off.
> Following the on-disk format experiment yesterday, I changed the
> format to:
>  - a list a _short_ SHA-1 of cached commits
>  - a list of cache entries, each (5 uint32_t) consists of:
>    - uint32_t for the index in .idx sha-1 table to get full SHA-1 of
>      the commit
>    - uint32_t for timestamp
>    - uint32_t for tree, 1st and 2nd parents for the index in .idx
>      table

Thanks for working on this, as it was the next step I was going to take. :)

The short-sha1 is a clever idea. Looks like it saves us on the order of
4MB for linux-2.6 (versus the full 20-byte sha1). Not as big as the
savings we get from dropping the other 3 sha1's to uint32_t, but still
not bad.

I guess the next steps in iterating on this would be:

  1. splitting out the refactoring here into separate patches

  2. squashing the cleaned-up bits into my patch 4/6

  3. deciding whether this should go into a separate file or as part of
     index v3. Your offsets depend on the .idx file having a sorted sha1
     list. That is not likely to change, but it would still be nice to
     make sure they cannot get out of sync. I'm still curious what the
     performance impact is for mmap-ing N versus N+8MB.

> The length of SHA-1 is chosen to be able to unambiguously identify any
> cached commits. Full SHA-1 check is done after to catch false
> positives.

Just to be clear, these false positives come because the abbreviation is
unambiguous within the packfile, but we might be looking for a commit
that is not even in our pack, right?

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to