On Thu, Aug 10, 2017 at 01:49:24PM -0700, Junio C Hamano wrote:
> The lower 4-byte of moff (before incrementing it with msize) were
> already encoded to the output stream before this hunk. Shouldn't
> we be checking if moff would fit in uint32_t _before_ that happens?
moff is otherwise only decremented or assigned with an offset generated by
create_delta_index. These offsets are limited by 4GB.
Any larger offets would be a programming bug - so qualify for just a "assert".
> IOW, in a much earlier part of that "while (data < top)" loop, there
> is a code that tries to find a match that gives us a large msize by
> iterating over index->hash, and it sets msize and moff as a potential
> location that we may want to use. If moff is already big there, then
> we shouldn't consider it a match because we cannot encode its location
> using 4-byte anyway, no?
Recovering from an incorrect moff would add more complex code for a case, that
should not happen. A bailout would be sufficient.
> Cutting it off at here by resetting msize to 0 might help the next
> iteration (I didn't check, but is the effect of it is to corrupt the
> "val" rolling checksum and make it unlikely that the hash
> computation would not find a correct match?) but it somehow feels
> like closing the barn door after the horse has already bolted...
The current code produces incorrect deltas - its not just a checksum issue.
By the way:
Somebody interested in JGIT should also look at these two bugs:
copy would also encode beyond 4GB - producing truncated delta offset.
apply uses int for decoding length values.