On Thu, Sep 5, 2013 at 10:36 PM, Jeff King <p...@peff.net> wrote:
>> > This is going to screw up pack v4 (yes, someday I'll have the
>> > time to make it real).
>> I don't know if this is still true, but given that patches are
>> being sent out about it, I thought it relevant.
> I haven't looked carefully at the pack v4 patches yet, but I suspect
> that yes, it's still a problem. The premise of pack v4 is that we can do
> better by not storing the raw git object bytes, but rather storing
> specialized representations of the various components. For example, by
> using an integer to store the mode rather than the ascii representation.
> But that representation does not represent the "oops, I have a 0-padded
> mode" quirk. And we have to be able to recover the original object, byte
> for byte, from the v4 representation (to verify sha1, or to generate a
> loose object or v2 pack).
> There are basically two solutions:
> 1. Add a single-bit flag for "I am 0-padded in the real data". We
> could probably even squeeze it into the same integer.
> 2. Have a "classic" section of the pack that stores the raw object
> bytes. For objects which do not match our expectations, store them
> raw instead of in v4 format. They will not get the benefit of v4
> optimizations, but if they are the minority of objects, that will
> only end up with a slight slow-down.
3. Detect this situation and fall back to v2.
4. Update v4 to allow storing raw tree entries mixing with v4-encoded
tree entries. This is something between (1) and (2)
> As I said, I have not looked carefully at the v4 patches, so maybe they
> handle this case already. But of the two solutions, I prefer (2). Doing
> (1) can solve _this_ problem, but it complicates the format, and does
> nothing for any future compatibility issues. Whereas (2) is easy to
> implement, since it is basically just pack v2 (and implementations would
> need a pack v2 reader anyway).
I think (4) fits better in v4 design and probably not hard to do. Nico
recently added a code to embed a tree entry inline, but the mode must
be encoded (and can't contain leading zeros). We could have another
code to store mode in ascii. This also makes me wonder if we might
have similar problems with timezones, which are also specially encoded
(3) is probably easiest. We need to scan through all tree entries
first when creating v4 anyway. If we detect any anomalies, just switch
back to v2 generation. The user will be force to rewrite history in
order to take full advantage of v4 (they can have a pack of weird
trees in v2 and the rest in v4 pack, but that's not optimal).
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html