On Thu, Sep 27, 2012 at 10:45:32AM -0700, Shawn O. Pearce wrote:
> On 2012-08-12 Nguyen Thai Ngoc Duy <pclo...@gmail.com> wrote:
> > Long term we might gain slight lookup speedup if we know object type
> > as search region is made smaller. But for that to happen, we need to
> > propagate object type hint down to find_pack_entry_one() and friends.
> > Possible thing to do, I think.
> I'm not sure reclustering the index by object type is going to make a
> worthwhile difference. Of 2.2m objects in the Linux tree, 320k are
> commits. The difference between doing the binary search through all
> objects vs. just commits is only 2 iterations more of binary search if
> we assume the per-type ranges have their own fan-out tables.
To me the big win would be implicit indexing for items that are present
for every instance of a particular object type. So if we wanted to keep
the timestamp for every commit, you could have a "pack-*.timestamps"
that is literally just a packed list of uint32's, one per commit, where
the position of a commit's timestamp in the list is the same as its
position in the index of sha1s in the pack index.
That's simple to do if your index is just commits. But if it includes
all objects, then your list is sparse. So either you waste space by
making an empty slot for the non-commit objects, or you have an extra
level of indirection mapping the commit into the packed list, which is
going to double the storage in this case (though you could reuse that
extra mapping for the parent, generation number, etc, so it at least
gets amortized as you store more data). Or is there some clever solution
For your extension, I don't think it matters. You're sparse even in the
commit-object space, so you have to store the mapping anyway. And your
data is big enough that the overhead isn't too painful.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html