On Mon, Aug 13, 2012 at 2:49 AM, Junio C Hamano <gits...@pobox.com> wrote:
> For example, the reachability bitmap would want to say something
> like "Traversing from commit A, these objects in this pack are
> reachable."  The bitmap for one commit A would logically consist of
> N bits for a packfile that stores N objects (the resulting bitmap
> needs to be compressed before going to disk, perhaps with RLE or
> something).  With the single "sorted by SHA-1" table, we can use the
> index in that single table to enumerate all reachable objects of any
> type in one go.  With four separate tables, on the other hand, we
> would need four bitmaps per commit.

No we still need one per commit. The n-th bit is in the order of the
object in the pack, not the index. How sha-1 is sorted does not
matter.

> Either way is _possible_, but I think the former is simpler, and the
> latter makes it harder to introduce new types of objects in the
> future, which I do not think we have examined possible use cases
> well enough to make that decision to say "four types is enough
> forever".

New types can be put in one of those four tables, depending on its
purpose. The reason I split because I care particularly about commits
and trees. If the new type serves the same purpose as tree, for
example, then it's better put in tree table...

> In either way, we would have such bitmap (or a set of four bitmaps
> in your case) for more than one commit (it is not necessary or
> desirable to add the reachability bitmap to all commits), and such a
> "reachability extension" would need to store a sequence of "the
> commit object name the bitmap (or a set of four bitmaps) is about,
> and the bitmap (or set of four bitmaps)".  That object name does not
> have to be 20-byte but would be a varint representation of the
> offset into the "sorted by SHA-1" table.

How do you reach the bitmap, given its commit sha-1?

> That varint representation
> would be smaller by about 3.5 bits if you have a separate "commit
> only, sorted by SHA-1" table (as the number of all objects tend to
> be 10x larger than the number of all commits that need them).  For
> the particular case of "we want to only annotate the commits, never
> other kinds of objects" use case, it would be a win.  But without
> knowing what other use cases we will want to use the "object
> annotation in the pack index file" mechanism for, it feels like a
> premature optimization to me to have four tables to shave 3.5 bits
> per object.

caching trees for faster traversal in general case (sort of pack v4
but it comes as a cache instead of replacing the real pack).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to