On Thu, Sep 27, 2012 at 10:39 AM, Jeff King <p...@peff.net> wrote:
> On Thu, Sep 27, 2012 at 08:51:51AM -0700, Shawn O. Pearce wrote:
>> On Thu, Sep 27, 2012 at 5:17 AM, Nguyen Thai Ngoc Duy <pclo...@gmail.com> 
>> wrote:
>> > I'd like to see some sort of extension mechanism like in
>> > $GIT_DIR/index, so that we don't have to increase pack index version
>> > often. What I have in mind is optional commit cache to speed up
>> > rev-list and merge, which could be stored in pack index too.
>> Can you share some of your ideas?
> Some of it is here:
>   http://article.gmane.org/gmane.comp.version-control.git/203308

Quoting from that patch:

On  2012-08-12 Nguyen Thai Ngoc Duy <pclo...@gmail.com> wrote:
> Long term we might gain slight lookup speedup if we know object type
> as search region is made smaller. But for that to happen, we need to
> propagate object type hint down to find_pack_entry_one() and friends.
> Possible thing to do, I think.

I'm not sure reclustering the index by object type is going to make a
worthwhile difference. Of 2.2m objects in the Linux tree, 320k are
commits. The difference between doing the binary search through all
objects vs. just commits is only 2 iterations more of binary search if
we assume the per-type ranges have their own fan-out tables.

> The main reason to group objects by type is to make it possible to
> create another sha1->something mapping for a particular object type,
> without wasting space for storing sha-1 keys again. For example, we
> can store commit caches, tree caches... at the end of the index as
> extensions.

Using ordinal position in the pack also works, and doesn't require
clustering objects by type.
