https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118837
--- Comment #9 from Simon Marchi <simon.marchi at polymtl dot ca> ---
(In reply to Tom Tromey from comment #8)
> (In reply to Simon Marchi from comment #7)
>
> > ... so I am afraid that any attempt at addressing this problem will be met
> > with a "it's a quality of implementation issue", but I think it's worth
> > trying. I think it would be better if DWARF didn't allow being ambiguous.
>
> Even just making this a hard rule would be an improvement.
>
> IIRC in one of the old threads the answer was that producers and consumers
> should agree, but to me this is clearly a bad answer, since the DWARF
> standard is precisely the mechanism by which they agree.
Agreed :).
> > So far my understanding is the problem is: you could have an attribute,
> > let's say DW_AT_const_value, with form DW_AT_data1, and value 0x80. As a
> > consumer, how do you know if that 0x80 means -1 or 128? You could have
> > compiler-1 people saying "it should obviously be interpreted as a signed
> > constant" and compiler-2 people saying "it should obviously be interpreted
> > as an unsigned constant". And then, as a consumer, you are in a pickle.
>
> Correct. And the decision varies based on context.
>
> > 1. The easy way: remove the DW_FORM_data<n> forms from the constant class.
> > This only leaves DW_FORM_udata and DW_FORM_sdata, which are define the
> > signedness explicitly. The advantages: it's an easy change for everybody
> > (in the spec, in producers, in consumers). How many ways of describing a
> > constant does DWARF really need? The downside is obviously a possible
> > increase in debug info size. But would it be significant? I would like to
> > prototype it an see how many values in a real-world DWARF file would now
> > take an extra byte because of this.
>
> Unfortunately DWARF seems to really love these space-saving
> micro-optimizations.
> Personally I think sleb/uleb is enough for nearly everything (basically all
> values not involving relocations). But, e.g., DWARF added DW_FORM_strx3,
> I guess to save one byte sometimes?
>
> Anyway one problem with this approach is that it provides no guidance
> for DWARF 3-5. Still, it would be hugely better, be easy to implement, etc.
> But I guess would be a pretty big change from existing practice.
I don't think we can really fix DWARF 3-5. Even if we did provide some
retroactive guidance on what the producers should have done with DWARF 3-5, the
compilers are out there already producing DWARF.
> > 2. A more complicated way: for each attribute that can be of the constant
> > class, define a default signedness (I imagine an extra column in Table 7.5:
> > Attribute encodings). If the form does not specify the signedness (i.e.
> > DW_FORM_data<n>), then the consumer would refer to that table to know if the
> > value should be treated as signed or unsigned.
>
> This is more or less the approach I took to fixing this in gdb: I went
> through every spot and tried to determine the correct answer. I don't
> think I quite finished.
>
> And there are spots that are "confused". That is, compilers in practice
> will emit a DW_FORM_sdata if the value in question is signed, but will
> emit DW_FORM_data1 and expect this to be zero-extended. This, to me,
> undermines the idea that the value or the context is "signed" or "unsigned".
>
> The main problem with this approach is that the answer doesn't just depend
> on the tag or the attribute. It can depend on other DIEs as well, for
> instance I believe a variant part's discriminant value is sign-extended,
> or not, depending on the type of the relevant field. This of course is
> difficult to implement, test, etc.
Regarding variants, DWARF 5 already says:
The value that selects a given variant may be represented in one of
three ways. The variant entry may have a DW_AT_discr_value attribute whose
value represents the discriminant value selecting this variant. The value of
this
attribute is encoded as an LEB128 number. The number is signed if the tag
type
for the variant part containing this variant is a signed type. The number is
unsigned if the tag type is an unsigned type.
In other words, despite DW_AT_discr_value being of the class "constant",
according to the attribute encoding table (which in theory allows
DW_FORM_data*), the text forces the use of DW_FORM_udata/DW_FORM_sdata,
depending on the signedness of the discriminant.
But even if DW_FORM_data* forms were allowed here, I think that the "default
signedness" column I imagine could say "depends on the signedness of the
discriminant".
I didn't think about relocation needing fixed-size values. Do values of the
"constant" class ever need to be relocated? I would guess not.
Regarding the space saving: I tried to modify gcc to always emit DW_FORM_udata
for unsigned constants and did a GDB build. The .debug_info size increased
from 151,601,709 bytes to 152,605,488 bytes (+0.66%). I would need to try to
force DW_FORM_udata only for the few "confused" attributes, but I am not good
enough at gcc to try that yet.
Maybe another solution would be to introduced fixed-size forms with explicit
signedness, DW_FORM_udata{1,2,4,8} and DW_FORM_sdata{1,2,4,8}.