dblaikie added inline comments.
================ Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295 + EntryIterator Absolute = + getAbsoluteLocations( + SectionedAddress{BaseAddr, SectionedAddress::UndefSection}, + LookupPooledAddress) + .begin(); ---------------- labath wrote: > dblaikie wrote: > > labath wrote: > > > This parallel iteration is not completely nice, but I think it's worth > > > being able to reuse the absolute range computation code. I'm open to > > > ideas for improvement though. > > Ah, I see - this is what you meant about "In particular it makes it > > possible to reuse this stuff in the dumping code, which would have been > > pretty hard with callbacks.". > > > > I'm wondering if that might be worth revisiting somewhat. A full iterator > > abstraction for one user here (well, two once you include lldb - but I > > assume it's likely going to build its own data structure from the iteration > > anyway, right? (it's not going to keep the iterator around, do anything > > interesting like partial iterations, re-iterate/etc - such that a callback > > would suffice)) > > > > I could imagine two callback APIs for this - one that gets entries and > > locations and one that only gets locations by filtering on the entry > > version. > > > > eg: > > > > // for non-verbose output: > > LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) { > > if (Verbose && actually dumping debug_loc) > > print(E) // print any LLE_*, raw parameters, etc > > if (L) > > print(*L) // print the resulting address range, section name (if > > verbose), > > else > > print(error stuff) > > }); > > > > One question would be "when/where do we print the DWARF expression" - if > > there's an error computing the address range, we can still print the > > expression, so maybe that happens unconditionally at the end of the > > callback, using the expression in the Entry? (then, arguably, the > > expression doesn't need to be in the DWARFLocation - and I'd say make the > > DWARFLocation a sectioned range, exactly the same type as for ranges so > > that part of the dumping code, etc, can be maximally reused) > Actually, what lldb currently does is that it does not build any data > structures at all (except storing the pointer to the right place in the > debug_loc section. Then, whenever it wants to do something to the loclist, it > parses it afresh. I don't know why it does this exactly, but I assume it has > something to do with most locations never being used, or being only a couple > of times, and the actual parsing being fairly fast. What this means is that > lldb is not really a single "user", but there are like four or five places > where it iterates through the list, depending on what does it actually want > to do with it. It also does partial iteration where it stops as soon as it > find the entry it was interested in. > Now, all of that is possible with a callback (though I am generally trying to > avoid them), but it does resurface the issue of what should be the value of > the second argument for DW_LLE_base_address entries (the thing which I > originally used a error type for). > Maybe this should be actually one callback API, taking two callback > functions, with one of them being invoked for base_address entries, and one > for others? However, if we stick to the current approaches in both LLE and > RLE of making the address pool resolution function a parameter (which I'd > like to keep, as it makes my job in lldb easier), then this would actually be > three callbacks, which starts to get unwieldy. Though one of those callbacks > could be removed with the "DWARFUnit implementing a AddrOffsetResolver > interface" idea, which I really like. :) Ah, thanks for the details on LLDB's location parsing logic. That's interesting indeed! I can appreciate an iterator-based API if that's the sort of usage we've got, though I expect it doesn't have any interest in the low-level encoding & just wants the fully processed address ranges/locations - it doesn't want base_address or end_of_list entries? & I think the dual-iteration is a fairly awkward API design, trying to iterate them in lock-step, etc. I'd rather avoid that if reasonably possible. Either having an iterator API that gives only the fully processed data/semantic view & a completely different API if you want to access the low level primitives (LLE, etc) (this is how ranges works - there's an API that gives a collection of ranges & abstracts over v4/v5/rnglists/etc - though that's partly motivated by a strong multi-client need for that functionality for symbolizing, etc - but I think it's a good abstraction/model anyway (& one of the reasons the inline range list printing doesn't include encoding information, the API it uses is too high level to even have access to it)) > Now, all of that is possible with a callback (though I am generally trying to > avoid them), but it does resurface the issue of what should be the value of > the second argument for DW_LLE_base_address entries (the thing which I > originally used a error type for). Sorry, my intent in the above API was for the second argument to be Optional's "None" state when... oh, I see, I did use Expected there, rather than Optional, because there are legit error cases. I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous. > Maybe this should be actually one callback API, taking two callback > functions, with one of them being invoked for base_address entries, and one > for others? However, if we stick to the current approaches in both LLE and > RLE of making the address pool resolution function a parameter (which I'd > like to keep, as it makes my job in lldb easier), then this would actually be > three callbacks, which starts to get unwieldy. Don't mind three callbacks too much. > Though one of those callbacks could be removed with the "DWARFUnit > implementing a AddrOffsetResolver interface" idea, which I really like. :) Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed). ================ Comment at: test/CodeGen/X86/debug-loclists.ll:16 ; CHECK-NEXT: 0x00000000: -; CHECK-NEXT: [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0 -; CHECK-NEXT: [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0 - -; There is no way to use llvm-dwarfdump atm (2018, october) to verify the DW_LLE_* codes emited, -; because dumper is not yet implements that. Use asm code to do this check instead. -; -; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s --check-prefix=ASM -; ASM: .section .debug_loclists,"",@progbits -; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # Length -; ASM-NEXT: .Ldebug_loclist_table_start0: -; ASM-NEXT: .short 5 # Version -; ASM-NEXT: .byte 8 # Address size -; ASM-NEXT: .byte 0 # Segment selector size -; ASM-NEXT: .long 0 # Offset entry count -; ASM-NEXT: .Lloclists_table_base0: -; ASM-NEXT: .Ldebug_loc0: -; ASM-NEXT: .byte 4 # DW_LLE_offset_pair -; ASM-NEXT: .uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset -; ASM-NEXT: .uleb128 .Ltmp0-.Lfunc_begin0 # ending offset -; ASM-NEXT: .byte 2 # Loc expr size -; ASM-NEXT: .byte 117 # DW_OP_breg5 -; ASM-NEXT: .byte 0 # 0 -; ASM-NEXT: .byte 4 # DW_LLE_offset_pair -; ASM-NEXT: .uleb128 .Ltmp0-.Lfunc_begin0 # starting offset -; ASM-NEXT: .uleb128 .Ltmp1-.Lfunc_begin0 # ending offset -; ASM-NEXT: .byte 2 # Loc expr size -; ASM-NEXT: .byte 115 # DW_OP_breg3 -; ASM-NEXT: .byte 0 # 0 -; ASM-NEXT: .byte 0 # DW_LLE_end_of_list -; ASM-NEXT: .Ldebug_loclist_table_end0: +; CHECK-NEXT: [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 => [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0 +; CHECK-NEXT: [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 => [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0 ---------------- labath wrote: > dblaikie wrote: > > labath wrote: > > > This tries to follow the RLE format as closely as possible, but I think > > > something like > > > ``` > > > [DW_LLE_offset_pair, 0x0000000000000000, 0x0000000000000004] => > > > [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0 > > > ``` > > > would make more sense (both here and for RLE). > > Yep, that'd make more sense to me - are you planning to unify the codepaths > > for this? I think that'd be for the best. > > > > If I were picking a printing from scratch, I might go with: > > > > DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0 > > > > Making it look a bit more like a function call and function arguments. > > Though the () might be confusing with the range notation. > > > > I'm also undecided on the " => " separator. Whether a ':' might be > > better/fine, etc. > > > > Totally open to ideas, but mostly I'd really love these to use loclist and > > ranges to use the same code as much as possible, so we can get consistency > > and any readability benefits, etc in both. > I like the function call format. I hoping to get some code reuse, though it's > still not fully clear to me how to achieve that.. I've posted my unification of range/loc/v4/v5 emission here: https://reviews.llvm.org/D68620 - & I'd imagine something similar in the parsing side. ================ Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7 # CHECK: DW_AT_location [DW_FORM_sec_offset] (0x0000000c -# CHECK-NEXT: [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0 -# CHECK-NEXT: [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, DW_OP_deref -# CHECK-NEXT: [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0 +# CHECK-NEXT: [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0 +# CHECK-NEXT: [DW_LLE_base_address ]: 0x0000000000000500 ---------------- labath wrote: > dblaikie wrote: > > I don't think the inline dumping should print the encoding - I'd borrow a > > lot from/try to unify with the ranges printing, which doesn't. I think > > verbose ranges print the same as non-verbose except they also add the > > section name/number. > Sure, I can do that, though I think that means there won't be a single place > where one can see both the raw encodings and their interpretation -- > section-based dumping will not show the interpretation (would you want me to > show still show them I they happen to be interpretable without the base > address or the address pool?), and the debug_info dumping will not show the > encoding. Is that bad? -- I don't know... Fair - that comes back to the issue I mentioned in a previous comment about potentially limiting dumping of non-debug_info sections based on the presence of a CU that references it (& only dumping it that way, rather than trying to parse it without a CU). DWARF isn't really designed to be parsed without the CU anyway. (could leave it in as best-effort to parse things without a referencing CU for debugging, etc). Mostly I'm interested in unification perhaps more/primarily, than feature improvements - then we can make feature improvements to both ranges and locs without having to duplicate things. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D68270/new/ https://reviews.llvm.org/D68270 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits