On Fri, 6 Sep 2013, Junio C Hamano wrote:
> Nicolas Pitre <n...@fluxnic.net> writes:
> > OK. If I understand correctly, the committer time stamp is more
> > important than the author's, right?
> Yeah, it matters a lot more when doing timestamp based traversal
> without the reachability bitmaps.
> > ... may I suggest keeping the tree reference first. That
> > is easy to skip over if you don't need it,...
> > ... Whereas, for a checkout where only the tree info is needed, if it is
> > located after the list of parents, then the above needs to be done for
> > all those parents and the committer time.
> Hmm. I wonder if that is a really good trade-off.
> "checkout" is to parse a single commit object and grab the "tree"
> field, while "log" is to parse millions of commit objects to grab
> their "parents" and "committer timestamp" fields ("log path/spec"
> needs to grab "tree", too, so that does not make "tree" extremely
> uncommon compared to the other two fields, though).
> I dunno.
I've therefore settled in the middle. The patch description now looks
| This goes as follows:
| - Tree reference: either variable length encoding of the index
| into the SHA1 table or the literal SHA1 prefixed by 0 (see
| - Parent count: variable length encoding of the number of parents.
| This is normally going to occupy a single byte but doesn't have to.
| - List of parent references: a list of encode_sha1ref() encoded
| references, or nothing if the parent count was zero.
| - Committer time stamp: variable length encoded. Year 2038 ready!
| Unlike the canonical representation, this is stored close to the
| list of parents so the important data for history traversal can be
| retrieved without parsing the rest of the object.
| - Committer reference: variable length encoding of an index into the
| ident dictionary table which also covers the time zone. To make
| the overall encoding efficient, the ident table is sorted by usage
| frequency so the most used entries are first and require the shortest
| index encoding.
| - Author time stamp: encoded as a difference against the committer
| time stamp, with the LSB used to indicate commit time is behind
| author time.
| - Author reference: same as committer reference.
| The remainder of the canonical commit object content is then zlib
| compressed and appended to the above.
I also updated the documentation patch accordingly in my tree.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html