> On Mar 17, 2015, at 3:39 PM, David Blaikie <[email protected]> wrote: > > > On Mar 17, 2015 3:28 PM, "Adrian Prantl" <[email protected] > <mailto:[email protected]>> wrote: > > > > > >> On Mar 16, 2015, at 6:47 PM, David Blaikie <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> > >> > >> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <[email protected] > >> <mailto:[email protected]>> wrote: > >>> > >>> > >>> Thanks for the explanation David, I missed that it is entirely the > >>> linker's (or some dwarf post-processor's) responsibility to find the > >>> module files and link in the debug info from the .pcm files, so debugger > >>> doesn’t notice a difference. > >> > >> > >> I think there's still some confusion here. Sorry if I'm rehashing > >> something, but I'll try to explain how this all works. > > > > > > thanks! > > > >> Normal split DWARF: > >> > >> Compiler generates two files: .o and .dwo. > >> .dwo has static, non-relocatable debug info. > >> .o has a skeleton compile_unit that has the name of the .dwo file and a > >> hash to verify that the .dwo file isn't stale when the debugger reads it. > >> The .o files are all linked together, the .dwo files stay where they are. > >> The debugger reads the linked executable, finds the skeleton compile_units > >> contained therein, and find/loads the .dwo files > > > > That makes total sense. > > > > Now, to eliminate the last remaining misconception: Does LLVM actually emit > > the separate .dwo file currently? > > Clang does emit the separate .dwo file. > > > From looking at testcases like DebugInfo/X86/fission-cu.ll it appears as if > > the relocatable and the non-relocatable output both end up in the .o file. > > This is where I got the impression that there was another tool involved > > that extracted the non-relocateable content from the .o into a .dwo file, > > but maybe that’s just something we do for testing? > > Llvm just puts everything in one file, then the clang driver runs a tool to > split them (objdump or something has a mode for doing this splitting). This > is just an implementation detail, we would do it directly in llvm, but > teaching llvm about outputting two object files simultaneously is hard. > Right, the driver is invoking "objcopy --extract-dwo” and “objcopy --strip-dwo” on the .o file.
Mystery solved. -- adrian > - David > > > > > -- adrian > > > >> > >> The scenario I have in mind for module debug info is this: > >> Module is compiled as an object file with debug info (this file is > >> actually a .dwo file, even if it has some other extension - it has the > >> non-relocatable debug info in it) > >> .o file has a comdat'd skeleton compile_unit describing the .dwo/module > >> file > >> <from here on no extra work is required, the linker and debugger just act > >> as normal> > >> The .o files are linked together, the skeleton compile_units get > >> deduplicated by the linker (comdat sections) > >> The debugger reads the linked executable, finds the skeleton compile_units > >> contained therein, and find/loads the module files just as .dwo files. > >> > >> There's no need for a debug-aware linker or any DWARF post-processing so > >> far as I understand it. No module-linking is required. Debugger reads the > >> modules directly, just as if they were .dwo files - they're just object > >> files in the filesystem like any other (that they have a different > >> extension isn't too important). > >> > >> Does this make sense? > >> > >> > >>> > >>> > >>>> On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected] > >>>> <mailto:[email protected]>> wrote: > >>>> > >>>> > >>>> > >>>> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul > >>>> <[email protected] > >>>> <mailto:[email protected]>> wrote: > >>>>> > >>>>> Beyond the above (that using a new tag would mean this would go from > >>>>> 'free' to 'not free' for GDB) having a new top level tag is pretty > >>>>> substantial (we only have two at the moment, and with our talk of > >>>>> modules being a "bag of dwarf" might go back to having one top level > >>>>> tag? (it's not clear to me from DWARF4 whether DW_TAG_module is > >>>>> currently a top-level tag, I don't think it is?) > >>>>> > >>>>> The .debug_info section contains one or more compilation units, partial > >>>>> units, or in DWARF 5, type units. DW_TAG_module isn't a unit, if you > >>>>> want it to be handled independently then it would need to be wrapped in > >>>>> a DW_TAG_partial_unit. You would probably then use > >>>>> DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module. > >>>> > >>>> > >>>> This makes a fair bit of sense - though the terminology's never going to > >>>> quite line up with modules, I suspect, and this would still require > >>>> modifying existing consumers (well, GDB) that can handle split-dwarf > >>>> today, I suspect (not sure how it'd handle partial_unit - maybe that > >>>> does work? - and still don't know how existing consumers would handle > >>>> imported_unit either - could be worth some testing, as it sounds sort of > >>>> right out of several less right options). > >>> > >>> > >>> The standard specifically recommends DW_TAG_partial_unit for #include > >>> directives so that sounds like a comparatively good match. Partial units > >>> were already introduced in DWARF3 so maybe GDB supports them. But even if > >>> it doesn’t this shouldn’t necessarily be a problem (unless it crashes). > >>> The DW_TAG_imported_unit since this is primarily useful for AST-based > >>> debuggers that know how to import a module before expression evaluation. > >>> > >>> -- adrian > >>> > >>>> - David > >>>>> > >>>>> (Sorry about the top-quoting but Outlook can't handle HTML editing > >>>>> properly.) > >>> > >>> > >>> Unfortunately the gmail client somewhat forces a thread to HTML — gmail > >>> quotation markers mysteriously disappear in the plain text version > >>> displayed by other mail clients. > >>> > >>>>> --paulr > >>>>> > >>>>> > >>>>> > >>>>> From: David Blaikie [mailto:[email protected] > >>>>> <mailto:[email protected]>] > >>>>> Sent: Monday, March 16, 2015 1:36 PM > >>>>> To: Adrian Prantl > >>>>> Cc: Richard Smith; Eric Christopher; llvm cfe; Greg Clayton; Robinson, > >>>>> Paul > >>>>> Subject: Re: [PATCH] Have clang list the imported modules in the debug > >>>>> info > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Mar 16, 2015 at 1:24 PM, Adrian Prantl <[email protected] > >>>>> <mailto:[email protected]>> wrote: > >>>>> > >>>>> > >>>>>> > >>>>>> On Mar 10, 2015, at 12:10 PM, David Blaikie <[email protected] > >>>>>> <mailto:[email protected]>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Mar 10, 2015 at 12:05 PM, Adrian Prantl <[email protected] > >>>>>> <mailto:[email protected]>> wrote: > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> On Mar 9, 2015, at 5:16 PM, David Blaikie <[email protected] > >>>>>>> <mailto:[email protected]>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Mar 9, 2015 at 5:07 PM, Adrian Prantl <[email protected] > >>>>>>> <mailto:[email protected]>> wrote: > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> On Mar 9, 2015, at 2:14 PM, David Blaikie <[email protected] > >>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Mar 9, 2015 at 1:52 PM, Adrian Prantl <[email protected] > >>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> On Feb 24, 2015, at 3:06 PM, David Blaikie <[email protected] > >>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>> > >>>>>>>>> On Tue, Feb 24, 2015 at 2:56 PM, Adrian Prantl <[email protected] > >>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>> > >>>>>>>>>> On Feb 24, 2015, at 2:36 PM, David Blaikie <[email protected] > >>>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Feb 23, 2015 at 3:45 PM, Adrian Prantl <[email protected] > >>>>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Feb 23, 2015, at 3:37 PM, David Blaikie <[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Feb 23, 2015 at 3:32 PM, Adrian Prantl > >>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Feb 23, 2015, at 3:14 PM, David Blaikie <[email protected] > >>>>>>>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 3:08 PM, Adrian Prantl > >>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Feb 23, 2015, at 2:59 PM, David Blaikie > >>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 2:51 PM, Adrian Prantl > >>>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > On Jan 20, 2015, at 11:07 AM, David Blaikie > >>>>>>>>>>>>>>>>>> > <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > My vague recollection from the previous design > >>>>>>>>>>>>>>>>>> > discussions was that these module references would be > >>>>>>>>>>>>>>>>>> > their own 'unit' COMDAT'd so that we don't end up with > >>>>>>>>>>>>>>>>>> > the duplication of every module reference in every unit > >>>>>>>>>>>>>>>>>> > linked together when linking debug info? > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > I think in my brain I'd been picturing this module > >>>>>>>>>>>>>>>>>> > reference as being an extended fission reference > >>>>>>>>>>>>>>>>>> > (fission skeleton CU + extra fields for users who want > >>>>>>>>>>>>>>>>>> > to load the Clang AST module directly and skip the split > >>>>>>>>>>>>>>>>>> > CU). > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Apologies for letting this rest for so long. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Your memory was of course correct and I didn’t follow up > >>>>>>>>>>>>>>>>>> on this because I had convinced myself that the fission > >>>>>>>>>>>>>>>>>> reference would be completely sufficient. Now that I’ve > >>>>>>>>>>>>>>>>>> been thinking some more about it, I don’t think that it is > >>>>>>>>>>>>>>>>>> sufficient in the LTO case. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Here is the example from the > >>>>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html>: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> foo.o: > >>>>>>>>>>>>>>>>>> .debug_info.dwo > >>>>>>>>>>>>>>>>>> DW_TAG_compile_unit > >>>>>>>>>>>>>>>>>> // For DWARF consumers > >>>>>>>>>>>>>>>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") > >>>>>>>>>>>>>>>>>> DW_AT_dwo_id ([unique AST signature]) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> .debug_info > >>>>>>>>>>>>>>>>>> DW_TAG_compile_unit > >>>>>>>>>>>>>>>>>> DW_TAG_variable > >>>>>>>>>>>>>>>>>> DW_AT_name "x" > >>>>>>>>>>>>>>>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> In this example it is clear that foo.o imported MyModule > >>>>>>>>>>>>>>>>>> because its DWO skeleton is there in the same object file. > >>>>>>>>>>>>>>>>>> But if we deal with the result of an LTO compilation we > >>>>>>>>>>>>>>>>>> will end up with many compile units in the same > >>>>>>>>>>>>>>>>>> .debug_info section, plus a bunch of skeleton compile > >>>>>>>>>>>>>>>>>> units for _all_ imported modules in the entire project. We > >>>>>>>>>>>>>>>>>> thus loose the ability to determine which of the compile > >>>>>>>>>>>>>>>>>> units imported which module. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Why would we need to know which CU imported which modules? > >>>>>>>>>>>>>>>>> (I can imagine some possible reasons, but wondering what > >>>>>>>>>>>>>>>>> you have in mind) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> When the debugger is stopped at a breakpoint and the user > >>>>>>>>>>>>>>>> wants to evaluate an expression, it should import the > >>>>>>>>>>>>>>>> modules that are available at this location, so the user can > >>>>>>>>>>>>>>>> write the expression from within the context of the > >>>>>>>>>>>>>>>> breakpoint (e.g., without having to fully qualify each type, > >>>>>>>>>>>>>>>> etc). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm not sure how much current debuggers actually worry about > >>>>>>>>>>>>>>> that - (& this may differ from lldb to gdb to other things, > >>>>>>>>>>>>>>> of course). I'm pretty sure at least for GDB, a context in > >>>>>>>>>>>>>>> one CU is as good as one in another (at least without > >>>>>>>>>>>>>>> split-dwarf, type units, etc - with those sometimes things > >>>>>>>>>>>>>>> end up overly restrictive as the debugger won't search > >>>>>>>>>>>>>>> everything properly). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> eg: if you have a.cpp: int main() { }, b.cpp: void func() { } > >>>>>>>>>>>>>>> and you run 'start' in gdb (which breaks at the beginning of > >>>>>>>>>>>>>>> main) you can still run 'p func()' to call the func, even > >>>>>>>>>>>>>>> though there's no declaration of it in a.cpp, etc. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> LLDB would definitely care (as it is using clang for the > >>>>>>>>>>>>>> expression evaluation supporting these kinds of features is > >>>>>>>>>>>>>> really straightforward there). By importing the modules > >>>>>>>>>>>>>> (rather than searching through the DWARF), the expression > >>>>>>>>>>>>>> evaluator gains access to additional declarations that are not > >>>>>>>>>>>>>> there in the DWARF, such as templates. But since clang modules > >>>>>>>>>>>>>> are not namespaces, we can’t generally "import the world” as a > >>>>>>>>>>>>>> debugger would usually do. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sorry, not sure I understand this last sentence - could you > >>>>>>>>>>>>> explain further? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I imagine it would be rather limiting for the user if they > >>>>>>>>>>>>> could only use expressions that are valid in this file from the > >>>>>>>>>>>>> file - it wouldn't be uncommon to want to call a function from > >>>>>>>>>>>>> another module/file/etc to aid in debugging. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Usually LLDB’s expression evaluator works by creating a clang > >>>>>>>>>>>> AST type out of a DWARF type and inserting it into its AST > >>>>>>>>>>>> context. We could pre-polulate it with the definitions from the > >>>>>>>>>>>> imported modules (with all sorts of benefits as described > >>>>>>>>>>>> above), but that only works if no two modules conflict. If the > >>>>>>>>>>>> declaration can’t be found in any imported module, LLDB would > >>>>>>>>>>>> still import it from DWARF in the “traditional” fashion. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> But it would import it from DWARF in other TUs rather than use > >>>>>>>>>>> the module info just because the module wasn't directly > >>>>>>>>>>> referenced from this TU? That would seem strange to me. (you > >>>>>>>>>>> would lose debug info fidelity (by falling back to DWARF even > >>>>>>>>>>> though there are modules with the full fidelity info) > >>>>>>>>>>> unnecessarily, it sounds like) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I think it’s reasonable to expect full fidelity for everything > >>>>>>>>>> that is available in the current TU, and having the normal > >>>>>>>>>> DWARF-based debugging capabilities for everything beyond that. But > >>>>>>>>>> we can only ever provide full fidelity if we have the list of > >>>>>>>>>> imports for the current TU. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Would it be reasonable to use the accelerator table/index to > >>>>>>>>>> lookup the types, then if the type is in the module you could use > >>>>>>>>>> the module rather than the DWARF stashed alongside it? (so the > >>>>>>>>>> comdat'd split-dwarf skeleton CU for the module would have an > >>>>>>>>>> index to tell you what names are inside it, but if you got an > >>>>>>>>>> index hit you'd just look at the module instead of loading the > >>>>>>>>>> split-dwarf debug info in the referenced file) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I don’t think this approach would work for templates and > >>>>>>>>>> enumerator values; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Not sure why enumerator values are an issue - but templates (& all > >>>>>>>>> manner of other things that don't make it into the index, > >>>>>>>>> unfortunately), sure. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> they aren’t in the accelerator tables to begin with. It would also > >>>>>>>>>> be slower if the declaration is available in a module. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Though you're rapidly going to end up loading a lot of modules in > >>>>>>>>> (as you go up & down a stack printing various things you'll cross > >>>>>>>>> into other TUs & load more modules). > >>>>>>>>> > >>>>>>>>> For a standard DWARF consumer, it seems fine to just have a > >>>>>>>>> comdat'd skeleton CU for a module without the need for other CUs to > >>>>>>>>> mention which module CUs they reference (but I could be wrong here) > >>>>>>>>> & that's the design we originally discussed. > >>>>>>>>> > >>>>>>>>> It would seem unfortunate to bloat every CU with a non-deduplicable > >>>>>>>>> list of every module it references, but if that's necessary for a > >>>>>>>>> serialized AST aware debugger, it might be fine to have it as an > >>>>>>>>> option (so long as it can be turned off) & may still benefit from > >>>>>>>>> that list not being the authoritative module reference, but a > >>>>>>>>> /very/ terse reference to it so all the extra flags & stuff can be > >>>>>>>>> in the deduplicable comdat (& to keep it as consistent as possible > >>>>>>>>> between the flag (on/off) codepaths for this extra data). Maybe a > >>>>>>>>> FORM_block (?) of fixed-size hashes of all the modules > >>>>>>>>> back-to-back, so it's as small as possible? > >>>>>>>>> > >>>>>>>>> But I wouldn't mind spending some more time discussing whether > >>>>>>>>> there's a better way to keep these things streamlined/symmetric/the > >>>>>>>>> same between modular and non-modular debug info. > >>>>>>>> > >>>>>>>> Sure! > >>>>>>>> > >>>>>>>> Now that we established that recording the list of imported modules > >>>>>>>> for every CU is useful for an AST-based debugger, > >>>>>>>> > >>>>>>>> > >>>>>>>> +Richard, just to see if he's got some ideas about how a debugger > >>>>>>>> might efficiently use modules to support debugger scenarios and > >>>>>>>> whether or not having a list of which modules are referenced from > >>>>>>>> which contexts is valuable in that. > >>>>>>>> > >>>>>>>> It still concerns me that this would create something of a > >>>>>>>> regression/oddity/difference between AST-based debug info (you > >>>>>>>> wouldn't be able to handle expressions referencing things in other > >>>>>>>> TUs) and non-AST based debug info (where I think the average user is > >>>>>>>> used to not worrying about what headers are included in the current > >>>>>>>> file they're debugging when they try to use a type or other > >>>>>>>> identifier) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> If I understood you correctly, this is not actually the case. The > >>>>>>> list of imported modules allows the AST-based debugger to import all > >>>>>>> the modules that were imported by the CU that the current frame is > >>>>>>> in. This enables the user to, e.g., type "p myVector->size()" even > >>>>>>> though std::vector<MyClass>::size() was not used by the CU and is > >>>>>>> thus not available in DWARF. > >>>>>>>> > >>>>>>>> If the user types “p foo” even though foo was not defined in any > >>>>>>>> imported module the debugger can — after failing to import foo via > >>>>>>>> clang — still fall back to looking up foo in DWARF and do what it > >>>>>>>> always did. > >>>>>>> > >>>>>>> > >>>>>>> If you do the DWARF fallback then you'll get a pretty clear > >>>>>>> inconsistency between templates and non-templates. If I have a > >>>>>>> function foo and a function template foo_tmpl in one file, and I'm > >>>>>>> debugging in another file I'll be able to call 'foo' (normal DWARF > >>>>>>> fallback/search) but not foo_tmpl (if I'm calling a new instantiation > >>>>>>> of foo_tmpl - if I'm calling an existing instantiation presumably the > >>>>>>> fallback would catch me). Seems unfortunate/confusing, perhaps. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Good point, but it my guess is that this wouldn’t be any worse than > >>>>>> the “why can’t I print the size() of this vector!?”-situations we have > >>>>>> at the moment. > >>>>>> > >>>>>> > >>>>>> Sure - it's strictly better in the sense that there are strictly more > >>>>>> expressions that can be evaluated, but seems incomplete is my point, > >>>>>> and maybe worth considering alternative designs that might be > >>>>>> more-betterer. > >>>>>> > >>>>>>> > >>>>>>> In certain situations (i.e., non-templates) the debugger could use > >>>>>>> the DWARF in the modules to print a message about which module to > >>>>>>> import. > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> let’s talk about how to most efficiently represent this information. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> In the CU, using DW_TAG_imported_module appears to be the most > >>>>>>>>> appropriate choice, even though there is some room for confusion > >>>>>>>>> since C++ using declarations are also represented this way. Inside > >>>>>>>>> the DW_TAG_imported_module, we could use > >>>>>>>>> > >>>>>>>>> (1) a DW_AT_import that references the skeleton (I hope that is the > >>>>>>>>> right terminology) CU for the module, the idea being that the > >>>>>>>>> skeleton CU would contain all the details (flags, name, include > >>>>>>>>> dirs, hash, ...) and be in a comdat'ed section. > >>>>>>>> > >>>>>>>> > >>>>>>>> I'd be concerned about overloading the terminology & confusing other > >>>>>>>> debuggers - they might try to follow the DW_AT_import and be > >>>>>>>> surprised that it doesn't refer to a DW_TAG_namespace tag. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> That’s a valid concern, and we probably should not be emitting this > >>>>>>>> if we have any evidence of, e.g., gdb crashes when encountering such > >>>>>>>> a construct. Then again, we would be using a DW_TAG_imported_module > >>>>>>>> to express what it is meant to express according to the DWARF spec > >>>>>>>> (namely importing a module)... but I admit that the tag also does > >>>>>>>> have a very specific meaning for C++, which we maybe shouldn’t > >>>>>>>> overload. > >>>>>>> > >>>>>>> > >>>>>>> That's my concern, yes. > >>>>>>> > >>>>>>>> > >>>>>>>> The right thing here is probably to put aside my personal sense of > >>>>>>>> aesthetics and use a private _LLVM_ namespace for all new additions, > >>>>>>>> and then attempt to standardize an official DWARF version once we > >>>>>>>> know what is really needed and what isn't. > >>>>>>> > >>>>>>> > >>>>>>> I'd prefer this, yes. I mean the usual bar we use for language > >>>>>>> features is that they're at least proposed for standardization before > >>>>>>> we adopt them in clang - I wouldn't mind a similar bar here. If you > >>>>>>> want to bring up this use of DW_TAG_imported_module with the DWARF > >>>>>>> committee & see if it sounds reasonable (& test/inquire about GDB's > >>>>>>> behavior here). > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> I started a thread on dwarf-discuss to this end > >>>>>>> (http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org > >>>>>>> <http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org>, > >>>>>>> the list archives are only visible to subscribers, but anyone can > >>>>>>> subscribe). > >>>>>> > >>>>>> > >>>>>> Cool cool > >>>>> > >>>>> > >>>>> > >>>>> To paraphrase the replies that my question solicited: We are, perhaps > >>>>> not very surprising, encouraged to follow the standard and use a > >>>>> DW_TAG_imported_module that references a DW_TAG_module. If we, however, > >>>>> choose to describe the module by using a skeleton DW_TAG_compile_unit, > >>>>> we should be careful (my own words) about using a > >>>>> DW_TAG_imported_module until that use is sanctioned by the standard. > >>>>> > >>>>> > >>>>> > >>>>> I see two possible ways to proceed in this spirit: > >>>>> > >>>>> a) Rename the module skeleton DW_TAG_compile_units to DW_TAG_module, > >>>>> but keep all the comdat/split dwarf goodness from the original proposal > >>>>> [1]. My understanding is that even though we are making clever use of > >>>>> the split DWARF features, GDB would still need to be taught to follow > >>>>> references to external files, > >>>>> > >>>>> > >>>>> Not sure what you're referring to here, perhaps a misunderstanding > >>>>> about how split DWARF works. > >>>>> > >>>>> To the best of my knowledge, what we've talked about for module DWARF > >>>>> debug info is actually just split-dwarf, no extra work required by > >>>>> DWARF consumers*. > >>>>> > >>>>> * It's, admittedly, a little tricksy to include type unit references in > >>>>> an object file that doesn't include the type unit at all - relying on > >>>>> it being linked into the final executable. But DWARF doesn't really > >>>>> talk about objects versus executables, etc - so, so long as the type > >>>>> unit is there in the end, it's valid DWARF no matter how it got there > >>>>> (& should work fine for existing consumers - they can't tell if the > >>>>> type unit was in every object file that referenced the type or not once > >>>>> it's been linked and deduplicated). > >>>>> > >>>>>> > >>>>>> so having it recognize a new tag in this context doesn’t appear to be > >>>>>> much additional effort (but others may provide more insight here). > >>>>> > >>>>> > >>>>> Beyond the above (that using a new tag would mean this would go from > >>>>> 'free' to 'not free' for GDB) having a new top level tag is pretty > >>>>> substantial (we only have two at the moment, and with our talk of > >>>>> modules being a "bag of dwarf" might go back to having one top level > >>>>> tag? (it's not clear to me from DWARF4 whether DW_TAG_module is > >>>>> currently a top-level tag, I don't think it is?) > >>>>> > >>>>>> > >>>>>> b) Emit an LLVM-specific DW_AT_LLVM_import attribute inside the > >>>>>> DW_TAG_imported_module (or vice versa) that refers to the skeleton > >>>>>> DW_TAG_compile_unit. > >>>>>> > >>>>>> > >>>>>> > >>>>>> I think that option (a) is a bit more elegant and it is bending the > >>>>>> dwarf standard not quite as much and will make the dwarf output a bit > >>>>>> more readable. > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- adrian > >>>>>> > >>>>>> > >>>>>> > >>>>>> [1] Module debugging proposal for reference: > >>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html > >>>>>> <http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> - David > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- adrian > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> But extension tags seems like the conservatively correct option (not > >>>>>>> sure what GDB does on tags it doesn't recognize - I forget if it > >>>>>>> warns or just completely ignores them, hopefully the latter) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> (2) David’s suggestion of using a custom form that records the > >>>>>>>>>> module hash directly is quite space-efficient, but it has the > >>>>>>>>>> drawback of not being resilient against small changes to the > >>>>>>>>>> imported module > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> That's going to be true of the normal fission info here (the > >>>>>>>>> skeleton CU and the full CU in the .dwo file (or module) are > >>>>>>>>> associated by hash) - granted, in the "loading an AST" mode, you > >>>>>>>>> can ignore those hashes and rely on your custom attributes instead. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> , since clang’s module hash changes each time the module is being > >>>>>>>>>> rebuilt. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Clang's module hash only changes if the DWARF contents change - it > >>>>>>>>> doesn't use a timestamp or anything. It seems like actually you're > >>>>>>>>> going to want to fail to load even more aggressively - there are > >>>>>>>>> ways the AST might've changed that the debug info doesn't reflect > >>>>>>>>> but are still important (a type unreferenced in this module, but > >>>>>>>>> built into some other code that is not built with debug info > >>>>>>>>> changes - no hash changes because the debug info for that type is > >>>>>>>>> unreferenced here, but if you try to use it you could have an > >>>>>>>>> incompatible layout, etc). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Agreed: If the module contents changed the debugger needs to display > >>>>>>>> a big flashing "here be dragons" warning. > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> This is less of an issue if the hash is referring to a skeleton CU > >>>>>>>>> in the same file, which contains all the detailed information. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Personally I’d prefer option 1 because mostly uses the existing > >>>>>>>>> mechanisms from DWARF. Here’s a visual guide to the options on the > >>>>>>>>> table: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> (1) > >>>>>>>>> > >>>>>>>>> foo.o (compiled with, let’s call it .. "-gmodule-imports”) > >>>>>>>>> > >>>>>>>>> ----- > >>>>>>>>> > >>>>>>>>> .debug_info: > >>>>>>>>> > >>>>>>>>> DW_TAG_compile_unit > >>>>>>>>> > >>>>>>>>> DW_AT_name(“foo.c”) > >>>>>>>>> > >>>>>>>>> DW_TAG_imported_module > >>>>>>>>> > >>>>>>>>> DW_AT_import(DW_FORM_ref_addr 0x123) // Could be a > >>>>>>>>> FORM_ref_sig8 0x1234ABCDE as well. > >>>>>>>>> > >>>>>>>>> DW_TAG_imported_module > >>>>>>>>> > >>>>>>>>> DW_AT_import(...) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> .debug_info.dwo: > >>>>>>>>> > >>>>>>>>> // Skeleton CUs for modules imported by foo.o. > >>>>>>>>> > >>>>>>>>> 0x123: > >>>>>>>>> > >>>>>>>>> DW_TAG_compile_unit > >>>>>>>>> > >>>>>>>>> // Used by split-dwarf debuggers to find external type > >>>>>>>>> definitions. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >>>>>>>>> > >>>>>>>>> DW_AT_dwo_id(“0x1234ABCDE”) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> // Used by AST-based debuggers to import the module. > >>>>>>>>> > >>>>>>>>> DW_AT_name(“Foundation”) > >>>>>>>> > >>>>>>>> > >>>>>>>> (side notes: the mixed indentation here makes it a bit hard to read > >>>>>>>> this example, and I'd make sure /all/ the extended attributes > >>>>>>>> (including the name here) use custom attribute names, not standard > >>>>>>>> ones) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Agreed. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> DW_AT_LLVM_sysroot(“/“) > >>>>>>>>> > >>>>>>>>> DW_AT_LLVM_include_dir(“”) > >>>>>>>>> > >>>>>>>>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> (2) > >>>>>>>>> > >>>>>>>>> .debug_info.dwo: > >>>>>>>>> > >>>>>>>>> (As above.) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> .debug_info: > >>>>>>>>> > >>>>>>>>> DW_TAG_compile_unit > >>>>>>>>> > >>>>>>>>> DW_AT_name(“foo.c”) > >>>>>>>>> > >>>>>>>>> DW_AT_LLVM_imported_modules(DW_FORM_block 0x1234ABCDE > >>>>>>>>> 0xDEADBEEF 0x....) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Now I’m curious what option (3) will look like; the one that we’ll > >>>>>>>>> actually implement! > >>>>>>>> > >>>>>>>> > >>>>>>>> ;) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- adrian > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
