>> In this patch, I chose to try to handle weird leb128 encodings by
>> preserving their values when possible; or returning the maximum value
>> on overflow. It isn't clear to me that this is necessarily the best
>> choice.
Mark> I think it is a good thing to accept "padded" leb128 numbers, but
Mark> probably up to a point. The padding will probably be either mod-4 or
Mark> mod-8 bytes. So lets at most read up to 16 bytes total.
Now I think all this is just overkill. Actually-existing leb128
relocations (see commit 7d81bc93 in binutils) seem to shrink the
representation. There was some talk in the Rust leb128 decoder issues
about this topic, but that was a wish-list thing possibly for wasm -- so
not exactly a need.
>> + const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1;
Mark> Shouldn't this be a const unsigned char?
The C "usual arithmetic conversions" will promote for the "&" anyway.
Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1
Mark> So we are really only interested in the lowest bit?
Yeah. We could hard-code that if you prefer, but I wrote it this way to
pretend that maybe the types in the code would be changed (widened) some
day.
>> + if (unlikely (value == UINT64_MAX))
>> + return INT64_MAX;
Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the
Mark> representation of -1?
Yes. Normally in sleb128 I guess -1 would be written {0x7f}. However
one could write it with any number of 0xff prefix bytes first. This
weirdness is one reason I changed my mind on handling over-long leb128
sequences.
I guess the only solution here is to write 2 copies of each function.
Mark> But what I don't fully understand is how this works with a "padded"
Mark> leb128, won't we possibly forget to negate in that case?
It isn't clear to me what to do when a sleb128 specifies more than 64
bits. Another reason perhaps to avoid this area.
Tom