Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Tue, 17 Mar 2015 15:47:40 -0700

> On Mar 17, 2015, at 3:39 PM, David Blaikie <[email protected]> wrote:
> 
> 
> On Mar 17, 2015 3:28 PM, "Adrian Prantl" <[email protected] 
> <mailto:[email protected]>> wrote:
> >
> >
> >> On Mar 16, 2015, at 6:47 PM, David Blaikie <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >>
> >>
> >>
> >> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >>>
> >>>
> >>> Thanks for the explanation David, I missed that it is entirely the 
> >>> linker's (or some dwarf post-processor's) responsibility to find the 
> >>> module files and link in the debug info from the .pcm files, so debugger 
> >>> doesn’t notice a difference.
> >>
> >>
> >> I think there's still some confusion here. Sorry if I'm rehashing 
> >> something, but I'll try to explain how this all works.
> >
> >
> > thanks!
> >
> >> Normal split DWARF:
> >>
> >> Compiler generates two files: .o and .dwo. 
> >> .dwo has static, non-relocatable debug info. 
> >> .o has a skeleton compile_unit that has the name of the .dwo file and a 
> >> hash to verify that the .dwo file isn't stale when the debugger reads it.
> >> The .o files are all linked together, the .dwo files stay where they are.
> >> The debugger reads the linked executable, finds the skeleton compile_units 
> >> contained therein, and find/loads the .dwo files
> >
> > That makes total sense.
> >
> > Now, to eliminate the last remaining misconception: Does LLVM actually emit 
> > the separate .dwo file currently?
> 
> Clang does emit the separate .dwo file.
> 
> > From looking at testcases like DebugInfo/X86/fission-cu.ll it appears as if 
> > the relocatable and the non-relocatable output both end up in the .o file. 
> > This is where I got the impression that there was another tool involved 
> > that extracted the non-relocateable content from the .o into a .dwo file, 
> > but maybe that’s just something we do for testing?
> 
> Llvm just puts everything in one file, then the clang driver runs a tool to 
> split them (objdump or something has a mode for doing this splitting). This 
> is just an implementation detail, we would do it directly in llvm, but 
> teaching llvm about outputting two object files simultaneously is hard.
> 
Right, the driver is invoking "objcopy --extract-dwo” and “objcopy --strip-dwo” 
on the .o file.


Mystery solved.
-- adrian
> - David
> 
> >
> > -- adrian
> >
> >>
> >> The scenario I have in mind for module debug info is this:
> >> Module is compiled as an object file with debug info (this file is 
> >> actually a .dwo file, even if it has some other extension - it has the 
> >> non-relocatable debug info in it)
> >> .o file has a comdat'd skeleton compile_unit describing the .dwo/module 
> >> file
> >> <from here on no extra work is required, the linker and debugger just act 
> >> as normal>
> >> The .o files are linked together, the skeleton compile_units get 
> >> deduplicated by the linker (comdat sections)
> >> The debugger reads the linked executable, finds the skeleton compile_units 
> >> contained therein, and find/loads the module files just as .dwo files.
> >>
> >> There's no need for a debug-aware linker or any DWARF post-processing so 
> >> far as I understand it. No module-linking is required. Debugger reads the 
> >> modules directly, just as if they were .dwo files - they're just object 
> >> files in the filesystem like any other (that they have a different 
> >> extension isn't too important).
> >>
> >> Does this make sense?
> >>
> >>  
> >>>
> >>>
> >>>> On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected] 
> >>>> <mailto:[email protected]>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul 
> >>>> <[email protected] 
> >>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>> Beyond the above (that using a new tag would mean this would go from 
> >>>>> 'free' to 'not free' for GDB) having a new top level tag is pretty 
> >>>>> substantial (we only have two at the moment, and with our talk of 
> >>>>> modules being a "bag of dwarf" might go back to having one top level 
> >>>>> tag? (it's not clear to me from DWARF4 whether DW_TAG_module is 
> >>>>> currently a top-level tag, I don't think it is?)
> >>>>>
> >>>>> The .debug_info section contains one or more compilation units, partial 
> >>>>> units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if you 
> >>>>> want it to be handled independently then it would need to be wrapped in 
> >>>>> a DW_TAG_partial_unit.  You would probably then use 
> >>>>> DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module.
> >>>>
> >>>>
> >>>> This makes a fair bit of sense - though the terminology's never going to 
> >>>> quite line up with modules, I suspect, and this would still require 
> >>>> modifying existing consumers (well, GDB) that can handle split-dwarf 
> >>>> today, I suspect (not sure how it'd handle partial_unit - maybe that 
> >>>> does work? - and still don't know how existing consumers would handle 
> >>>> imported_unit either - could be worth some testing, as it sounds sort of 
> >>>> right out of several less right options).
> >>>
> >>>
> >>> The standard specifically recommends DW_TAG_partial_unit for #include 
> >>> directives so that sounds like a comparatively good match. Partial units 
> >>> were already introduced in DWARF3 so maybe GDB supports them. But even if 
> >>> it doesn’t this shouldn’t necessarily be a problem (unless it crashes). 
> >>> The DW_TAG_imported_unit since this is primarily useful for AST-based 
> >>> debuggers that know how to import a module before expression evaluation.
> >>>
> >>> -- adrian
> >>>
> >>>> - David 
> >>>>>
> >>>>> (Sorry about the top-quoting but Outlook can't handle HTML editing 
> >>>>> properly.)
> >>>
> >>>
> >>> Unfortunately the gmail client somewhat forces a thread to HTML — gmail 
> >>> quotation markers mysteriously disappear in the plain text version 
> >>> displayed by other mail clients.
> >>>
> >>>>> --paulr
> >>>>>
> >>>>>  
> >>>>>
> >>>>> From: David Blaikie [mailto:[email protected] 
> >>>>> <mailto:[email protected]>] 
> >>>>> Sent: Monday, March 16, 2015 1:36 PM
> >>>>> To: Adrian Prantl
> >>>>> Cc: Richard Smith; Eric Christopher; llvm cfe; Greg Clayton; Robinson, 
> >>>>> Paul
> >>>>> Subject: Re: [PATCH] Have clang list the imported modules in the debug 
> >>>>> info
> >>>>>
> >>>>>  
> >>>>>
> >>>>>  
> >>>>>
> >>>>>  
> >>>>>
> >>>>> On Mon, Mar 16, 2015 at 1:24 PM, Adrian Prantl <[email protected] 
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>>  
> >>>>>>
> >>>>>> On Mar 10, 2015, at 12:10 PM, David Blaikie <[email protected] 
> >>>>>> <mailto:[email protected]>> wrote:
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>> On Tue, Mar 10, 2015 at 12:05 PM, Adrian Prantl <[email protected] 
> >>>>>> <mailto:[email protected]>> wrote:
> >>>>>>
> >>>>>>  
> >>>>>>>
> >>>>>>> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected] 
> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] 
> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>
> >>>>>>>  
> >>>>>>>>
> >>>>>>>> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] 
> >>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] 
> >>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] 
> >>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>
> >>>>>>>>> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] 
> >>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] 
> >>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl 
> >>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] 
> >>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl 
> >>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie 
> >>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl 
> >>>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie 
> >>>>>>>>>>>>>>>>>> > <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>> > My vague recollection from the previous design 
> >>>>>>>>>>>>>>>>>> > discussions was that these module references would be 
> >>>>>>>>>>>>>>>>>> > their own 'unit' COMDAT'd so that we don't end up with 
> >>>>>>>>>>>>>>>>>> > the duplication of every module reference in every unit 
> >>>>>>>>>>>>>>>>>> > linked together when linking debug info?
> >>>>>>>>>>>>>>>>>> >
> >>>>>>>>>>>>>>>>>> > I think in my brain I'd been picturing this module 
> >>>>>>>>>>>>>>>>>> > reference as being an extended fission reference 
> >>>>>>>>>>>>>>>>>> > (fission skeleton CU + extra fields for users who want 
> >>>>>>>>>>>>>>>>>> > to load the Clang AST module directly and skip the split 
> >>>>>>>>>>>>>>>>>> > CU).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Apologies for letting this rest for so long.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Your memory was of course correct and I didn’t follow up 
> >>>>>>>>>>>>>>>>>> on this because I had convinced myself that the fission 
> >>>>>>>>>>>>>>>>>> reference would be completely sufficient. Now that I’ve 
> >>>>>>>>>>>>>>>>>> been thinking some more about it, I don’t think that it is 
> >>>>>>>>>>>>>>>>>> sufficient in the LTO case.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Here is the example from the 
> >>>>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html
> >>>>>>>>>>>>>>>>>>  
> >>>>>>>>>>>>>>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> foo.o:
> >>>>>>>>>>>>>>>>>> .debug_info.dwo
> >>>>>>>>>>>>>>>>>>   DW_TAG_compile_unit
> >>>>>>>>>>>>>>>>>>      // For DWARF consumers
> >>>>>>>>>>>>>>>>>>      DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
> >>>>>>>>>>>>>>>>>>      DW_AT_dwo_id   ([unique AST signature])
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> .debug_info
> >>>>>>>>>>>>>>>>>>   DW_TAG_compile_unit
> >>>>>>>>>>>>>>>>>>     DW_TAG_variable
> >>>>>>>>>>>>>>>>>>       DW_AT_name "x"
> >>>>>>>>>>>>>>>>>>       DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> In this example it is clear that foo.o imported MyModule 
> >>>>>>>>>>>>>>>>>> because its DWO skeleton is there in the same object file. 
> >>>>>>>>>>>>>>>>>> But if we deal with the result of an LTO compilation we 
> >>>>>>>>>>>>>>>>>> will end up with many compile units in the same 
> >>>>>>>>>>>>>>>>>> .debug_info section, plus a bunch of skeleton compile 
> >>>>>>>>>>>>>>>>>> units for _all_ imported modules in the entire project. We 
> >>>>>>>>>>>>>>>>>> thus loose the ability to determine which of the compile 
> >>>>>>>>>>>>>>>>>> units imported which module.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Why would we need to know which CU imported which modules? 
> >>>>>>>>>>>>>>>>> (I can imagine some possible reasons, but wondering what 
> >>>>>>>>>>>>>>>>> you have in mind)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> When the debugger is stopped at a breakpoint and the user 
> >>>>>>>>>>>>>>>> wants to evaluate an expression, it should import the 
> >>>>>>>>>>>>>>>> modules that are available at this location, so the user can 
> >>>>>>>>>>>>>>>> write the expression from within the context of the 
> >>>>>>>>>>>>>>>> breakpoint (e.g., without having to fully qualify each type, 
> >>>>>>>>>>>>>>>> etc).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm not sure how much current debuggers actually worry about 
> >>>>>>>>>>>>>>> that - (& this may differ from lldb to gdb to other things, 
> >>>>>>>>>>>>>>> of course). I'm pretty sure at least for GDB, a context in 
> >>>>>>>>>>>>>>> one CU is as good as one in another (at least without 
> >>>>>>>>>>>>>>> split-dwarf, type units, etc - with those sometimes things 
> >>>>>>>>>>>>>>> end up overly restrictive as the debugger won't search 
> >>>>>>>>>>>>>>> everything properly).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } 
> >>>>>>>>>>>>>>> and you run 'start' in gdb (which breaks at the beginning of 
> >>>>>>>>>>>>>>> main) you can still run 'p func()' to call the func, even 
> >>>>>>>>>>>>>>> though there's no declaration of it in a.cpp, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> LLDB would definitely care (as it is using clang for the 
> >>>>>>>>>>>>>> expression evaluation supporting these kinds of features is 
> >>>>>>>>>>>>>> really straightforward there). By importing the modules 
> >>>>>>>>>>>>>> (rather than searching through the DWARF), the expression 
> >>>>>>>>>>>>>> evaluator gains access to additional declarations that are not 
> >>>>>>>>>>>>>> there in the DWARF, such as templates. But since clang modules 
> >>>>>>>>>>>>>> are not namespaces, we can’t generally "import the world” as a 
> >>>>>>>>>>>>>> debugger would usually do.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry, not sure I understand this last sentence - could you 
> >>>>>>>>>>>>> explain further?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I imagine it would be rather limiting for the user if they 
> >>>>>>>>>>>>> could only use expressions that are valid in this file from the 
> >>>>>>>>>>>>> file - it wouldn't be uncommon to want to call a function from 
> >>>>>>>>>>>>> another module/file/etc to aid in debugging.
> >>>>>>>>>>>>
> >>>>>>>>>>>>  
> >>>>>>>>>>>>
> >>>>>>>>>>>> Usually LLDB’s expression evaluator works by creating a clang 
> >>>>>>>>>>>> AST type out of a DWARF type and inserting it into its AST 
> >>>>>>>>>>>> context. We could pre-polulate it with the definitions from the 
> >>>>>>>>>>>> imported modules (with all sorts of benefits as described 
> >>>>>>>>>>>> above), but that only works if no two modules conflict. If the 
> >>>>>>>>>>>> declaration can’t be found in any imported module, LLDB would 
> >>>>>>>>>>>> still import it from DWARF in the “traditional” fashion.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> But it would import it from DWARF in other TUs rather than use 
> >>>>>>>>>>> the module info just because the module wasn't directly 
> >>>>>>>>>>> referenced from this TU? That would seem strange to me. (you 
> >>>>>>>>>>> would lose debug info fidelity (by falling back to DWARF even 
> >>>>>>>>>>> though there are modules with the full fidelity info) 
> >>>>>>>>>>> unnecessarily, it sounds like)
> >>>>>>>>>>
> >>>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> I think it’s reasonable to expect full fidelity for everything 
> >>>>>>>>>> that is available in the current TU, and having the normal 
> >>>>>>>>>> DWARF-based debugging capabilities for everything beyond that. But 
> >>>>>>>>>> we can only ever provide full fidelity if we have the list of 
> >>>>>>>>>> imports for the current TU.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Would it be reasonable to use the accelerator table/index to 
> >>>>>>>>>> lookup the types, then if the type is in the module you could use 
> >>>>>>>>>> the module rather than the DWARF stashed alongside it? (so the 
> >>>>>>>>>> comdat'd split-dwarf skeleton CU for the module would have an 
> >>>>>>>>>> index to tell you what names are inside it, but if you got an 
> >>>>>>>>>> index hit you'd just look at the module instead of loading the 
> >>>>>>>>>> split-dwarf debug info in the referenced file)
> >>>>>>>>>>
> >>>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> I don’t think this approach would work for templates and 
> >>>>>>>>>> enumerator values;
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Not sure why enumerator values are an issue - but templates (& all 
> >>>>>>>>> manner of other things that don't make it into the index, 
> >>>>>>>>> unfortunately), sure.
> >>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> they aren’t in the accelerator tables to begin with. It would also 
> >>>>>>>>>> be slower if the declaration is available in a module.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Though you're rapidly going to end up loading a lot of modules in 
> >>>>>>>>> (as you go up & down a stack printing various things you'll cross 
> >>>>>>>>> into other TUs & load more modules).
> >>>>>>>>>
> >>>>>>>>> For a standard DWARF consumer, it seems fine to just have a 
> >>>>>>>>> comdat'd skeleton CU for a module without the need for other CUs to 
> >>>>>>>>> mention which module CUs they reference (but I could be wrong here) 
> >>>>>>>>> & that's the design we originally discussed.
> >>>>>>>>>
> >>>>>>>>> It would seem unfortunate to bloat every CU with a non-deduplicable 
> >>>>>>>>> list of every module it references, but if that's necessary for a 
> >>>>>>>>> serialized AST aware debugger, it might be fine to have it as an 
> >>>>>>>>> option (so long as it can be turned off) & may still benefit from 
> >>>>>>>>> that list not being the authoritative module reference, but a 
> >>>>>>>>> /very/ terse reference to it so all the extra flags & stuff can be 
> >>>>>>>>> in the deduplicable comdat (& to keep it as consistent as possible 
> >>>>>>>>> between the flag (on/off) codepaths for this extra data). Maybe a 
> >>>>>>>>> FORM_block (?) of fixed-size hashes of all the modules 
> >>>>>>>>> back-to-back, so it's as small as possible?
> >>>>>>>>>
> >>>>>>>>> But I wouldn't mind spending some more time discussing whether 
> >>>>>>>>> there's a better way to keep these things streamlined/symmetric/the 
> >>>>>>>>> same between modular and non-modular debug info.
> >>>>>>>>
> >>>>>>>> Sure!
> >>>>>>>>
> >>>>>>>> Now that we established that recording the list of imported modules 
> >>>>>>>> for every CU is useful for an AST-based debugger,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> +Richard, just to see if he's got some ideas about how a debugger 
> >>>>>>>> might efficiently use modules to support debugger scenarios and 
> >>>>>>>> whether or not having a list of which modules are referenced from 
> >>>>>>>> which contexts is valuable in that.
> >>>>>>>>
> >>>>>>>> It still concerns me that this would create something of a 
> >>>>>>>> regression/oddity/difference between AST-based debug info (you 
> >>>>>>>> wouldn't be able to handle expressions referencing things in other 
> >>>>>>>> TUs) and non-AST based debug info (where I think the average user is 
> >>>>>>>> used to not worrying about what headers are included in the current 
> >>>>>>>> file they're debugging when they try to use a type or other 
> >>>>>>>> identifier)
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>> If I understood you correctly, this is not actually the case. The 
> >>>>>>> list of imported modules allows the AST-based debugger to import all 
> >>>>>>> the modules that were imported by the CU that the current frame is 
> >>>>>>> in. This enables the user to, e.g., type "p myVector->size()" even 
> >>>>>>> though std::vector<MyClass>::size() was not used by the CU and is 
> >>>>>>> thus not available in DWARF. 
> >>>>>>>>
> >>>>>>>> If the user types “p foo” even though foo was not defined in any 
> >>>>>>>> imported module the debugger can — after failing to import foo via 
> >>>>>>>> clang — still fall back to looking up foo in DWARF and do what it 
> >>>>>>>> always did.
> >>>>>>>
> >>>>>>>
> >>>>>>> If you do the DWARF fallback then you'll get a pretty clear 
> >>>>>>> inconsistency between templates and non-templates. If I have a 
> >>>>>>> function foo and a function template foo_tmpl in one file, and I'm 
> >>>>>>> debugging in another file I'll be able to call 'foo' (normal DWARF 
> >>>>>>> fallback/search) but not foo_tmpl (if I'm calling a new instantiation 
> >>>>>>> of foo_tmpl - if I'm calling an existing instantiation presumably the 
> >>>>>>> fallback would catch me). Seems unfortunate/confusing, perhaps.
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>> Good point, but it my guess is that this wouldn’t be any worse than 
> >>>>>> the “why can’t I print the size() of this vector!?”-situations we have 
> >>>>>> at the moment.
> >>>>>>
> >>>>>>
> >>>>>> Sure - it's strictly better in the sense that there are strictly more 
> >>>>>> expressions that can be evaluated, but seems incomplete is my point, 
> >>>>>> and maybe worth considering alternative designs that might be 
> >>>>>> more-betterer.
> >>>>>>  
> >>>>>>>
> >>>>>>> In certain situations (i.e., non-templates) the debugger could use 
> >>>>>>> the DWARF in the modules to print a message about which module to 
> >>>>>>> import.
> >>>>>>>
> >>>>>>>  
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> let’s talk about how to most efficiently represent this information.
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> In the CU, using DW_TAG_imported_module appears to be the most 
> >>>>>>>>> appropriate choice, even though there is some room for confusion 
> >>>>>>>>> since C++ using declarations are also represented this way. Inside 
> >>>>>>>>> the DW_TAG_imported_module, we could use 
> >>>>>>>>>
> >>>>>>>>> (1) a DW_AT_import that references the skeleton (I hope that is the 
> >>>>>>>>> right terminology) CU for the module, the idea being that the 
> >>>>>>>>> skeleton CU would contain all the details (flags, name, include 
> >>>>>>>>> dirs, hash, ...) and be in a comdat'ed section.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I'd be concerned about overloading the terminology & confusing other 
> >>>>>>>> debuggers - they might try to follow the DW_AT_import and be 
> >>>>>>>> surprised that it doesn't refer to a DW_TAG_namespace tag.
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>> That’s a valid concern, and we probably should not be emitting this 
> >>>>>>>> if we have any evidence of, e.g., gdb crashes when encountering such 
> >>>>>>>> a construct. Then again, we would be using a DW_TAG_imported_module 
> >>>>>>>> to express what it is meant to express according to the DWARF spec 
> >>>>>>>> (namely importing a module)... but I admit that the tag also does 
> >>>>>>>> have a very specific meaning for C++, which we maybe shouldn’t 
> >>>>>>>> overload.
> >>>>>>>
> >>>>>>>
> >>>>>>> That's my concern, yes.
> >>>>>>>  
> >>>>>>>>
> >>>>>>>> The right thing here is probably to put aside my personal sense of 
> >>>>>>>> aesthetics and use a private _LLVM_ namespace for all new additions, 
> >>>>>>>> and then attempt to standardize an official DWARF version once we 
> >>>>>>>> know what is really needed and what isn't.
> >>>>>>>
> >>>>>>>
> >>>>>>> I'd prefer this, yes. I mean the usual bar we use for language 
> >>>>>>> features is that they're at least proposed for standardization before 
> >>>>>>> we adopt them in clang - I wouldn't mind a similar bar here. If you 
> >>>>>>> want to bring up this use of DW_TAG_imported_module with the DWARF 
> >>>>>>> committee & see if it sounds reasonable (& test/inquire about GDB's 
> >>>>>>> behavior here).
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>> I started a thread on dwarf-discuss to this end 
> >>>>>>> (http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org 
> >>>>>>> <http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, 
> >>>>>>> the list archives are only visible to subscribers, but anyone can 
> >>>>>>> subscribe).
> >>>>>>
> >>>>>>
> >>>>>> Cool cool
> >>>>>
> >>>>>  
> >>>>>
> >>>>> To paraphrase the replies that my question solicited: We are, perhaps 
> >>>>> not very surprising, encouraged to follow the standard and use a 
> >>>>> DW_TAG_imported_module that references a DW_TAG_module. If we, however, 
> >>>>> choose to describe the module by using a skeleton DW_TAG_compile_unit, 
> >>>>> we should be careful (my own words) about using a 
> >>>>> DW_TAG_imported_module until that use is sanctioned by the standard.
> >>>>>
> >>>>>  
> >>>>>
> >>>>> I see two possible ways to proceed in this spirit:
> >>>>>
> >>>>> a) Rename the module skeleton DW_TAG_compile_units to DW_TAG_module, 
> >>>>> but keep all the comdat/split dwarf goodness from the original proposal 
> >>>>> [1]. My understanding is that even though we are making clever use of 
> >>>>> the split DWARF features, GDB would still need to be taught to follow 
> >>>>> references to external files,
> >>>>>
> >>>>>
> >>>>> Not sure what you're referring to here, perhaps a misunderstanding 
> >>>>> about how split DWARF works.
> >>>>>
> >>>>> To the best of my knowledge, what we've talked about for module DWARF 
> >>>>> debug info is actually just split-dwarf, no extra work required by 
> >>>>> DWARF consumers*.
> >>>>>
> >>>>> * It's, admittedly, a little tricksy to include type unit references in 
> >>>>> an object file that doesn't include the type unit at all - relying on 
> >>>>> it being linked into the final executable. But DWARF doesn't really 
> >>>>> talk about objects versus executables, etc - so, so long as the type 
> >>>>> unit is there in the end, it's valid DWARF no matter how it got there 
> >>>>> (& should work fine for existing consumers - they can't tell if the 
> >>>>> type unit was in every object file that referenced the type or not once 
> >>>>> it's been linked and deduplicated).
> >>>>>  
> >>>>>>
> >>>>>> so having it recognize a new tag in this context doesn’t appear to be 
> >>>>>> much additional effort (but others may provide more insight here).
> >>>>>
> >>>>>
> >>>>> Beyond the above (that using a new tag would mean this would go from 
> >>>>> 'free' to 'not free' for GDB) having a new top level tag is pretty 
> >>>>> substantial (we only have two at the moment, and with our talk of 
> >>>>> modules being a "bag of dwarf" might go back to having one top level 
> >>>>> tag? (it's not clear to me from DWARF4 whether DW_TAG_module is 
> >>>>> currently a top-level tag, I don't think it is?)
> >>>>>  
> >>>>>>
> >>>>>> b) Emit an LLVM-specific DW_AT_LLVM_import attribute inside the 
> >>>>>> DW_TAG_imported_module (or vice versa) that refers to the skeleton 
> >>>>>> DW_TAG_compile_unit.
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>> I think that option (a) is a bit more elegant and it is bending the 
> >>>>>> dwarf standard not quite as much and will make the dwarf output a bit 
> >>>>>> more readable.
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>> -- adrian
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>> [1] Module debugging proposal for reference: 
> >>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html 
> >>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> - David
> >>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>> -- adrian
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> But extension tags seems like the conservatively correct option (not 
> >>>>>>> sure what GDB does on tags it doesn't recognize - I forget if it 
> >>>>>>> warns or just completely ignores them, hopefully the latter)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> (2) David’s suggestion of using a custom form that records the 
> >>>>>>>>>> module hash directly is quite space-efficient, but it has the 
> >>>>>>>>>> drawback of not being resilient against small changes to the 
> >>>>>>>>>> imported module
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That's going to be true of the normal fission info here (the 
> >>>>>>>>> skeleton CU and the full CU in the .dwo file (or module) are 
> >>>>>>>>> associated by hash) - granted, in the "loading an AST" mode, you 
> >>>>>>>>> can ignore those hashes and rely on your custom attributes instead.
> >>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> , since clang’s module hash changes each time the module is being 
> >>>>>>>>>> rebuilt.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Clang's module hash only changes if the DWARF contents change - it 
> >>>>>>>>> doesn't use a timestamp or anything. It seems like actually you're 
> >>>>>>>>> going to want to fail to load even more aggressively - there are 
> >>>>>>>>> ways the AST might've changed that the debug info doesn't reflect 
> >>>>>>>>> but are still important (a type unreferenced in this module, but 
> >>>>>>>>> built into some other code that is not built with debug info 
> >>>>>>>>> changes - no hash changes because the debug info for that type is 
> >>>>>>>>> unreferenced here, but if you try to use it you could have an 
> >>>>>>>>> incompatible layout, etc).
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>> Agreed: If the module contents changed the debugger needs to display 
> >>>>>>>> a big flashing "here be dragons" warning.
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> This is less of an issue if the hash is referring to a skeleton CU 
> >>>>>>>>> in the same file, which contains all the detailed information.
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> Personally I’d prefer option 1 because mostly uses the existing 
> >>>>>>>>> mechanisms from DWARF. Here’s a visual guide to the options on the 
> >>>>>>>>> table:
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> (1)
> >>>>>>>>>
> >>>>>>>>> foo.o (compiled with, let’s call it .. "-gmodule-imports”)
> >>>>>>>>>
> >>>>>>>>> -----
> >>>>>>>>>
> >>>>>>>>> .debug_info:
> >>>>>>>>>
> >>>>>>>>>   DW_TAG_compile_unit
> >>>>>>>>>
> >>>>>>>>>     DW_AT_name(“foo.c”)
> >>>>>>>>>
> >>>>>>>>>     DW_TAG_imported_module
> >>>>>>>>>
> >>>>>>>>>       DW_AT_import(DW_FORM_ref_addr 0x123)  // Could be a 
> >>>>>>>>> FORM_ref_sig8 0x1234ABCDE as well.
> >>>>>>>>>
> >>>>>>>>>     DW_TAG_imported_module
> >>>>>>>>>
> >>>>>>>>>       DW_AT_import(...)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> .debug_info.dwo:
> >>>>>>>>>
> >>>>>>>>> // Skeleton CUs for modules imported by foo.o.
> >>>>>>>>>
> >>>>>>>>> 0x123:
> >>>>>>>>>
> >>>>>>>>>   DW_TAG_compile_unit
> >>>>>>>>>
> >>>>>>>>>     // Used by split-dwarf debuggers to find external type 
> >>>>>>>>> definitions.
> >>>>>>>>>
> >>>>>>>>>     
> >>>>>>>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >>>>>>>>>
> >>>>>>>>>     DW_AT_dwo_id(“0x1234ABCDE”)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>>     // Used by AST-based debuggers to import the module.
> >>>>>>>>>
> >>>>>>>>>     DW_AT_name(“Foundation”)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> (side notes: the mixed indentation here makes it a bit hard to read 
> >>>>>>>> this example, and I'd make sure /all/ the extended attributes 
> >>>>>>>> (including the name here) use custom attribute names, not standard 
> >>>>>>>> ones)
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>> Agreed.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>>
> >>>>>>>>>     DW_AT_LLVM_sysroot(“/“)
> >>>>>>>>>
> >>>>>>>>>     DW_AT_LLVM_include_dir(“”)
> >>>>>>>>>
> >>>>>>>>>     DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> (2)
> >>>>>>>>>
> >>>>>>>>> .debug_info.dwo:
> >>>>>>>>>
> >>>>>>>>> (As above.)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> .debug_info:
> >>>>>>>>>
> >>>>>>>>>   DW_TAG_compile_unit
> >>>>>>>>>
> >>>>>>>>>     DW_AT_name(“foo.c”)
> >>>>>>>>>
> >>>>>>>>>     DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE 
> >>>>>>>>> 0xDEADBEEF 0x....)
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> Now I’m curious what option (3) will look like; the one that we’ll 
> >>>>>>>>> actually implement!
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ;)
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>>
> >>>>>>>> -- adrian
> >>>>>>>>
> >>>>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>  
> >>>>>
> >>>>>  
> >>>>
> >>>>
> >>>
> >>
> >

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to