Hi Tom, On Mon, 2020-10-26 at 16:54 -0600, Tom Tromey wrote: > > > In this patch, I chose to try to handle weird leb128 encodings by > > > preserving their values when possible; or returning the maximum value > > > on overflow. It isn't clear to me that this is necessarily the best > > > choice. > > Mark> I think it is a good thing to accept "padded" leb128 numbers, but > Mark> probably up to a point. The padding will probably be either mod-4 or > Mark> mod-8 bytes. So lets at most read up to 16 bytes total. > > Now I think all this is just overkill. Actually-existing leb128 > relocations (see commit 7d81bc93 in binutils) seem to shrink the > representation. There was some talk in the Rust leb128 decoder issues > about this topic, but that was a wish-list thing possibly for wasm -- so > not exactly a need.
OK. Lets drop it then. It wasn't really supported earlier (not for more than 10 bytes anyway). So it shouldn't regress anything. > Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1 > Mark> So we are really only interested in the lowest bit? > > Yeah. We could hard-code that if you prefer, but I wrote it this way to > pretend that maybe the types in the code would be changed (widened) some > day. I would like at least a comment. It took me a while to see it really was just 1, and it might not be immediately clear why that is the correct value. > > > + if (unlikely (value == UINT64_MAX)) > > > + return INT64_MAX; > > Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the > Mark> representation of -1? > > Yes. Normally in sleb128 I guess -1 would be written {0x7f}. However > one could write it with any number of 0xff prefix bytes first. This > weirdness is one reason I changed my mind on handling over-long leb128 > sequences. > > I guess the only solution here is to write 2 copies of each function. That is what I would do, simply because I get headaches when trying to reason about these kind of signed/unsigned conversions. Of course you would still have some of that in the sleb128 variant because you'll have to use unsigned arithmetic and cast to signed at the end. > Mark> But what I don't fully understand is how this works with a > "padded" > Mark> leb128, won't we possibly forget to negate in that case? > > It isn't clear to me what to do when a sleb128 specifies more than 64 > bits. Another reason perhaps to avoid this area. There is not much we can do, except flag it as overflow. Any constant value in DWARF that represents a > 64bit value is problematic (see for example the DWARF5 DW_FROM_data16, which the spec claims is a "constant", but we have to treat it as block form). Cheers, Mark