Hi Mark,

On Tue, Feb 20, 2024 at 5:23 PM Mark Wielaard <m...@klomp.org> wrote:
>
> > As for the number of aranges found, there is a difference for libxul.so:
> > 250435 with the patch compared to 254832 without.  So 4397 fewer aranges
> > are found when using the new CU iteration method.  I'll dig into this and
> > see if there is a problem or if it's just due to some redundancy in
> > libxul's .debug_aranges.  FWIW there was no change to the aranges counts
> > for the other modules searched during this eu-stack firefox corefile test.
>
> A quick way to see where the differences are is using
> eu-readelf --debug-dump=decodedaranges before/after your patch.
>
> This is opposite to what I expected. I had expected there to be more,
> instead of less ranges. The difference is less than 2%. But still
> interesting to know what/why.
>
> Were there any differences in the backtraces? If not, then those
> ranges might not actually have been mapping to code.

The backtraces were identical.  I took a closer look at this and the
difference is due to clang including DW_AT_location addresses in
.debug_aranges.  This patch just looks for DW_AT_{high,low}_pc and
DW_AT_ranges (via dwarf_ranges) when generating the aranges list,
so these DW_AT_location addresses aren't included.

According to David Blaikie [1], "GCC does not include globals in
debug_aranges, so they probably can't be relied upon unfortunately
(unfortunate that Clang pays the cost when it emits them, but consumers
can't rely on them)".

Since consumers already can't assume that .debug_aranges includes
DW_AT_location addresses, I think it's reasonable for us to not include
them when dynamically generating aranges.  Especially since there could
be a noticeable performance cost to doing so.

A separate issue I saw is that some entries in .debug_aranges with
a decoded start address of "000000000000000000" wouldn't always
have matching entries in .debug_ranges.  This resulted in some dynamic
aranges not matching their corresponding .debug_aranges entry.

For example, the following decoded arange is in libxul's .debug_aranges:

    start: 000000000000000000, length:    13, CU DIE offset: 95189496

In the .debug_info for this CU, there is a corresponding subprogram with
low_pc 0 and high_pc 13.

However in the .debug_ranges entry for this CU, there is no range starting
at address 0 with size of at least 13.  The closest range list entry is:

    range 1, 1
    +0x0000000000000001 <__ehdr_start+0x1>..
    +000000000000000000 <libxul.so>

When we dynamically generate aranges this results in the following
decoded arange:

    start: 0x0000000000000001, length:     0, CU DIE offset: 95189496

So the start address is off by 1 and the length doesn't match the
.debug_aranges entry.  In some similar cases there wouldn't even be
a corresponding "range 1, 1" entry in .debug_ranges.  I'm not yet sure
what's going on here.

>
> > > Might it be an idea to leave dwarf_getaranges as it is and introduce a
> > > new (internal) function to get "dynamic" ranges? It looks like what
> > > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie
> > > and dwfl_module_addrdie. These are currently build on dwarf_getaranges,
> > > but could maybe use a new interface?
> >
> > IMO this depends on what users expect from dwarf_getaranges.  Do they
> > want the exact contents of .debug_aranges (whether or not it's complete)
> > or should dwarf_getaranges go beyond .debug_aranges to ensure the most
> > complete results?
> >
> > The comment for dwarf_getaranges in libdw.h simply reads "Return list
> > address ranges".  Since there's no mention of .debug_aranges specifically,
> > I think it's fair if dwarf_getaranges does whatever it can to ensure
> > comprehensive results.  In which case dwarf_getaranges should probably
> > dynamically generate aranges.
>
> You might be right that no user really cares. But as seen in the
> eu-readelf code, it might also be that people expected it to map to
> the ranges from .debug_aranges.
>
> So I would be happier if we just kept the dwarf_getaranges code as
> is. And just change the code in dwarf_addrdie and dwfl_module_addrdie.
>
> We could then also introduce a new public function, dwarf_getdieranges
> (?) that does the new thing. But it doesn't have to be public on the
> first try as long as dwarf_addrdie and dwfl_module_addrdie work. (We
> might want to change the interface of dwarf_getdieranges so it can be
> "lazy" for example.)

Ok this approach seems like the most flexible.  Users can have both
.debug_aranges and CU-based aranges plus we don't have to change the
semantics of dwarf_getaranges.  I'll submit a revised patch for this.

Aaron

[1] https://reviews.llvm.org/D123538#inline-1188522

Reply via email to