[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

David Blaikie via Phabricator via lldb-commits Fri, 04 Oct 2019 13:59:22 -0700

dblaikie added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
labath wrote:
> This parallel iteration is not completely nice, but I think it's worth being 
> able to reuse the absolute range computation code. I'm open to ideas for 
> improvement though.
Ah, I see - this is what you meant about "In particular it makes it possible to 
reuse this stuff in the dumping code, which would have been pretty hard with 
callbacks.".

I'm wondering if that might be worth revisiting somewhat. A full iterator 
abstraction for one user here (well, two once you include lldb - but I assume 
it's likely going to build its own data structure from the iteration anyway, 
right? (it's not going to keep the iterator around, do anything interesting 
like partial iterations, re-iterate/etc - such that a callback would suffice))

I could imagine two callback APIs for this - one that gets entries and 
locations and one that only gets locations by filtering on the entry version.

eg:

  // for non-verbose output:
  LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
    if (Verbose && actually dumping debug_loc)
      print(E) // print any LLE_*, raw parameters, etc
    if (L)
      print(*L) // print the resulting address range, section name (if 
verbose), 
    else
      print(error stuff)
  });

One question would be "when/where do we print the DWARF expression" - if 
there's an error computing the address range, we can still print the 
expression, so maybe that happens unconditionally at the end of the callback, 
using the expression in the Entry? (then, arguably, the expression doesn't need 
to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned 
range, exactly the same type as for ranges so that part of the dumping code, 
etc, can be maximally reused)

================
Comment at: test/CodeGen/X86/debug-loclists.ll:16
 ; CHECK-NEXT: 0x00000000:
-; CHECK-NEXT:  [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
-; CHECK-NEXT:  [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0
-
-; There is no way to use llvm-dwarfdump atm (2018, october) to verify the 
DW_LLE_* codes emited,
-; because dumper is not yet implements that. Use asm code to do this check 
instead.
-;
-; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s 
--check-prefix=ASM
-; ASM:      .section .debug_loclists,"",@progbits
-; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # 
Length
-; ASM-NEXT: .Ldebug_loclist_table_start0:
-; ASM-NEXT:  .short 5                              # Version
-; ASM-NEXT:  .byte 8                               # Address size
-; ASM-NEXT:  .byte 0                               # Segment selector size
-; ASM-NEXT:  .long 0                               # Offset entry count
-; ASM-NEXT: .Lloclists_table_base0:                
-; ASM-NEXT: .Ldebug_loc0:
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Lfunc_begin0-.Lfunc_begin0  # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 117                             # DW_OP_breg5
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp1-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 115                             # DW_OP_breg3
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 0                               # DW_LLE_end_of_list
-; ASM-NEXT: .Ldebug_loclist_table_end0:
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 
=> [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 
=> [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0
----------------
labath wrote:
> This tries to follow the RLE format as closely as possible, but I think 
> something like
> ```
> [DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => 
> [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
> ```
> would make more sense (both here and for RLE).
Yep, that'd make more sense to me - are you planning to unify the codepaths for 
this? I think that'd be for the best.

If I were picking a printing from scratch, I might go with:

  DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0

Making it look a bit more like a function call and function arguments. Though 
the () might be confusing with the range notation. 

I'm also undecided on the " => " separator. Whether a ':' might be better/fine, 
etc.

Totally open to ideas, but mostly I'd really love these to use loclist and 
ranges to use the same code as much as possible, so we can get consistency and 
any readability benefits, etc in both.

================
Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7
 # CHECK:       DW_AT_location [DW_FORM_sec_offset]   (0x0000000c
-# CHECK-NEXT:    [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0
-# CHECK-NEXT:    [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, 
DW_OP_deref
-# CHECK-NEXT:    [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_offset_pair  ]: 0x0000000000000000, 
0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_base_address ]: 0x0000000000000500
----------------
I don't think the inline dumping should print the encoding - I'd borrow a lot 
from/try to unify with the ranges printing, which doesn't. I think verbose 
ranges print the same as non-verbose except they also add the section 
name/number.

================
Comment at: test/DebugInfo/X86/fission-ranges.ll:48
 ; CHECK:      [[A]]:
-; CHECK-NEXT:   Addr idx 2 (w/ length 15): DW_OP_consts +0, DW_OP_stack_value
-; CHECK-NEXT:   Addr idx 3 (w/ length 15): DW_OP_reg0 RAX
-; CHECK-NEXT:   Addr idx 4 (w/ length 18): DW_OP_breg7 RSP-8
+; CHECK-NEXT:   [DW_LLE_startx_length]: 0x00000002, 0x0000000f => <Failed to 
read address offset 2> DW_OP_consts +0, DW_OP_stack_value
+; CHECK-NEXT:   [DW_LLE_startx_length]: 0x00000003, 0x0000000f => <Failed to 
read address offset 3> DW_OP_reg0 RAX
----------------
labath wrote:
> This is somewhat annoying, because the entries printed through the loclists 
> section will always have this error (as we don't have the DWARFUnit). I'll 
> have to figure out a way to suppress those, while still keeping them around 
> when printing from DWARFDie (as there a failure means a real error).
IMHO we may want to move to a model where we don't try to create/parse any 
content except by finding a reference from a CU (or the DWARFv5 stanfdalone 
line tables).

In theory, it's perfectly find to have random garbage in debug sections other 
than debug_info (or the standalone line table) - because the only parts that 
should be parsed are those referenced from debug_info.

This came up in the form of a bug in location list dumping when the binary is 
linked with bfd ld. It doesn't update any addresses to discarded sections, 
leaving them as zero (whereas gold and lld write the addend to the relocation - 
which generally makes sure any range pair doesn't end up as "zero zero" which 
marks the end of a list) which terminates a list early and leads to the 
following location expression to be parsed as the start of a new list... which 
is totally bogus.
Now, granted, the resulting debug info from bfd ld is wrong (if you had a 
location list spanning multiple functions (eg: a global variable had been put 
in a register for the duration of a function, etc) then resolving any one of 
those location entries to zero-zero would terminate the list early even though 
there might be non-dropped functions in the list after that point) - but I 
still think there's something to be said for it.

There's a fair counterargument too - that we might want to be able to make a 
best-effort to dump content that isn't complete (eg: if a section was emitted 
alone - or there was some hunk of unreferenced location list in the debug_loc 
section, it might be interesting to know what's in that hunk - might give you 
hints about where it /should/ have been referenced from)

Apparently binutils objdump when printing debug info only dumps those 
referenced pieces and prints info about "holes" when there's unreferenced 
chunks.

Ah, here's the bug context on that: https://bugs.llvm.org/show_bug.cgi?id=43290

But, yeah, all that aside - given the architecture of 
libDebugInfoDWARF/llvm-dwarfdump right now, yes, it'd be good to omit those 
error messages.

Also note that address indexes wouldn't be resolvable when dumping .dwo files - 
since the debug_addr would be in the .o file instead. So it'd be good to not 
print lots of error messages there either.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Reply via email to