Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Tue, 05 May 2015 20:09:58 -0700

> On May 5, 2015, at 3:14 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Tue, May 5, 2015 at 3:01 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> > On May 5, 2015, at 9:59 AM, David Blaikie <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> > On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> >>> On May 4, 2015, at 1:31 PM, David Blaikie <[email protected] 
> >>> <mailto:[email protected]>> wrote:
> >>>
> >> ...
> >>>> >>> So you're going to need to implement fission (to at least some 
> >>>> >>> degree) support in LLDB, then? (to support the case where you 
> >>>> >>> haven't linked debug info with llvm-dsymutil, but you've hit one of 
> >>>> >>> these lookup problems where you need to cross possibly-conflicting 
> >>>> >>> modules)
> >>>> >>
> >>>> >> Yes. Specifically, it won’t support type units, and it will look up 
> >>>> >> types by name rather than by signature. (cf. the second part of 
> >>>> >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
> >>>> >>  
> >>>> >> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>)
> >>>> >
> >>>> > How are you going to reference the types in the module's fission CU 
> >>>> > without type units/signatures? Are you going to emit type declarations 
> >>>> > into the normal CU and rely on the debugger to know that these 
> >>>> > declarations can be resolved by looking elsewhere? (just without the 
> >>>> > benefit of constraining that search to just looking for a matching TU?)
> >>>>
> >>>> If you look at the example in 
> >>>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
> >>>>  
> >>>> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>,
> >>>>  there will be an external type index (using the usual accelerator table 
> >>>> format) that maps an external type’s UID to a pcm. In the pcm there is 
> >>>> an extra accelerator table entry that maps UID to DIE offset.
> >>>
> >>> I mean I guess that's up to you, but seems like a relatively large 
> >>> workaround compared to supporting type units... (I mean certainly seems 
> >>> like strictly less work to do the workaround than implementing type units 
> >>> in LLDB, but a relatively large amount of work to do/throw away 
> >>> eventually once LLDB supports type units)
> >>
> >> It’s not primarily meant to be a workaround so LLDB doesn’t need to 
> >> implement type units; the UIDs double as the key (decl context + name) to 
> >> import types directly from the AST. The other advantage is that we won’t 
> >> have to worry about MD5 hash collisions, but that’s more of a theoretical 
> >> advantage.
> >
> > If there's a concern about collisions, we should fix that regardless 
> > (because we'll have the same problem with normal type units or modularized 
> > type units).
> 
> My understanding is that split DWARF is disambiguating hash collisions by 
> adding the full decl context and name to both the definition and the forward 
> declaration:
> 
> No, that's not a requirement. It's just an aspect of our implementation so we 
> can put entities in the type when needed (such as for putting member function 
> declarations in there to refererence from out of line member function 
> definitions)
> 
> In  the simplest case, you can reference a type unit directly:
> 
> DW_TAG_variable
>   DW_AT_type [DW_FORM_ref_sig8] ....


You’re right. The standard allows this, although all the examples in the 
appendix show the long form that LLVM also emits.
> 
> Without any DW_TAG_structure_type/namespaces/etc. Once we have Bag O' DWARF, 
> we'd move to that representation almost entirely. (the signature we're using 
> for this is the mangled name of the type so it includes the decl context in 
> the hash anyway)

.. which, incidentally, will look a lot like this UID example, right?

> (for completeness, the UID type reference would look like this:
>  DW_TAG_variable
>    DW_AT_name “myC”
>    DW_AT_type [DW_FORM_strp] “_ZTN1AS1BS1C” )
> 

Oh I see, you mean hashing the mangled name and using it as a signature rather 
than emitting the name directly. But then, to disambiguate, it would still be 
beneficial to have the mangled name available.

> 
> 
> So I think that hash collisions are actually handled for normal fission and 
> we don’t need to worry about them. The bag o’dwarf will still contain the 
> full decl context in the DWO, so it will be fine there, too.
> 
> >
> > If we've already got (& have to use in the case of DWARF fallback for 
> > modules) to support a hash or other numeric identifier, it might be good to 
> > use that for everything rather than having two mechanisms?
> 
> Generally, one way to do it is of course better than maintaining two :-)
> But using the UIDs in the .o file allows us to elide the the forward 
> declarations from the .o file
> 
> It doesn't, though - because we need the forward declarations to put members 
> into (such as member function definitions - this'll come up even for modules 
> quite commonly with any inline member function, for example - its definition 
> will appear in the .o file and the definition DW_TAG_subprogram will need to 
> DW_AT_specification pointing to the declaration of the member function - 
> which is a DW_TAG_subprogram inside the DW_TAG_class that is a declaration 
> with a signature that refers to the type unit).

Couldn’t we make the DW_AT_specification an external type reference pointing to 
the full type in the module?
> 
> Only once we've got Bag O' DWARF do we remove the need for these declarations.
> 
> Granted we can get rid of them in /some/ cases today - when there's no need 
> to reference members in the type (or nest other members - like a nested 
> class, for example - though we could possibly workaround even that with Bag O 
> DWARF by having the type in the type unit have a declaration of the nested 
> type - then reference that from the nested type definition in another type 
> unit or .o file, etc).
> 
> I don't think there's anything you can leverage here that's different from 
> what type units need. If you're going to do anything different from the way 
> type units work today, I'd really like to discuss that further and understand 
> what's different here that benefits from/allows for that difference that 
> doesn't apply to type units themselves.
>  
> (granted the UIDs carry the same information, so it’s not actually as small 
> as it looks like) so all that dsymutil needs to do is replace the string 
> references with FORM_ref_addrs, no further stripping needed.

Good point. I don’t have any other secret plans up my sleeve apart from what I 
already outlined in this thread :-)
I see the UID type references for LLDB like a bag o’dwarf version 0.5. Any 
lessons learned there (like the hash collisions) should definitely be addressed 
in the final bag o’dwarf design.

> 
> 
> >
> >>
> >>>
> >>>
> >>> >>
> >>> >>
> >>> >>>
> >>> >>> OK, so I think it's probably reasonable for now to just add 
> >>> >>> DW_TAG_modules to the CU for each referenced module (or does it have 
> >>> >>> to be each referenced submodule? (can two submodules within a single 
> >>> >>> module be contradictory/conflicting?)). Since we don't have any good 
> >>> >>> way to reference the module is a foreign unit while deduplicating 
> >>> >>> that unit... there's not much point having the imported_module - but 
> >>> >>> if you think it adds anything, I'm open to ideas.
> >>> >> It could help keeping things simpler.
> >>> >> Emitting it doesn’t add much semantic value because module imports 
> >>> >> always occur at the top level, but it will make the transition to the 
> >>> >> deduplicated TAG_modules easier — It could be easier to teach 
> >>> >> consumers once about imported_module({ref to TAG_module}) rather than 
> >>> >> having them also recognize top-level TAG_modules as an intermediate 
> >>> >> step. It’s also slightly easier to implement in LLVM because the 
> >>> >> imported_module allows us to anchor the TAG_module in the CU, but 
> >>> >> that’s not a very strong argument.
> >>> >
> >>> > Agreed on all counts (not a strong argument, but convenient enough, 
> >>> > etc, etc).
> >>> >
> >>> > I'm still not entirely sure what the right answer is here, though, 
> >>> > which is why I'm hesitant to bake anything in too strongly.
> >>> >
> >>> > To come back to one of the outstanding questions: Do you need submodule 
> >>> > import information, or just module level (if modules cannot have 
> >>> > internal conflicts and you can't avoid cross-module conflicts just by 
> >>> > lack of visibility (I have no idea if either of those things are true) 
> >>> > then you may just need per-module not per-submodule info)?
> >>>
> >>> At the moment I do not think that it makes sense for two submodules to 
> >>> conflict, but there is nothing in the clang documentation that explicitly 
> >>> forbids this. With this in mind, I think it is reasonable to not support 
> >>> submodules (at least initially) and always emit an import for the parent 
> >>> module.
> >>> Thats what I wanted to write ... but I as I’m browsing through our 
> >>> documentation, 
> >>> http://clang.llvm.org/docs/Modules.html#conflict-declarations 
> >>> <http://clang.llvm.org/docs/Modules.html#conflict-declarations> 
> >>> explicitly gives an example of two conflicting submodules, so maybe this 
> >>> is not a reasonable simplification after all. On the other hand, a quick 
> >>> grep over all system module maps on OS X doesn’t show a single conflict 
> >>> declaration.
> >>>
> >>> I still believe we do not need to support submodules right from the 
> >>> start, but we should have a story for getting there if we need to.
> >>>
> >>> Given the simple example that demonstrates the possibility, it seems fair 
> >>> to have a story for what that looks like, yes - even if a first 
> >>> pass/prototype doesn't support it.
> >>
> >> Sure.
> >>>
> >>>
> >>> >
> >>> > Also, does each submodule need different special attributes/flags? If 
> >>> > the special codegen attributes you want are at the module level, it'd 
> >>> > probably be best to keep those on the Skeleton CU for the module (that 
> >>> > will be comdat folded, etc, on ELF - and they could be DWARF-aware 
> >>> > deduplicated by llvm-dsymutil) so they're not duplicated. The 
> >>> > DW_TAG_module would then just have a DW_AT_signature attribute or 
> >>> > something similarly small/trivial to point to the skeleton CU.
> >>>
> >>> The attributes are derived from cc1 command line arguments. Not two 
> >>> submodules imported by one CU can have different attributes. All 
> >>> submodules in a pcm also share their attributes. Putting them into the 
> >>> skeleton CU appears to be the most efficient place to put them, though 
> >>> perhaps not the most logical one.
> >>>
> >>> Why not the most logical? It'd be nice if it were a DW_TAG_module instead 
> >>> of a DW_TAG_compile_unit - but given the limited vocabulary we have in 
> >>> DWARF top level tags, it seems as good as we can have.
> >>
> >> I tend to view the module configuration (include path, isysroot, 
> >> configuration macros) to be a part of the module and not a part of the 
> >> skeleton that points to the split debug info for that module.
> >
> > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way or 
> > another. Either way the debugger's going to have to jump the
> 
> [there’s something missing towards the end]
> No it’s not a workaround. Having the full module configuration is what allows 
> LLDB to rebuild the module if the module cache has been cleared.
> >
> >> A module is uniquely identified by name + configuration. That’s why I feel 
> >> it should be part of the tag that also holds the name.
> >>
> >>>
> >>> I would prefer to stick the attributes on the (top-level) DW_TAG_module 
> >>> and later deduplicate the attributes together with the DW_TAG_module. 
> >>> Sticking them on the skeleton won’t save any space in the .o files and 
> >>> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot - 
> >>> 1x FORM_ref_sig_8) per CU and imported module.
> >>>
> >>> Seems nicer not to duplicate them, especially since not everyone will be 
> >>> using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux 
> >>> won't have that convenience). Eventually we can use Bag O' DWARF for the 
> >>> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow 
> >>> that as a top-level tag, if desired/useful - I'm not sure it adds a lot) 
> >>> and have the imported_module reference it that way. 
> >>> (DW_TAG_imported_module, DW_AT_import, DW_FORM_ref_sig8)
> >>>
> >>> I'm not /hugely/ invested in this, but we do have people caring about 
> >>> LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and 
> >>> dsymutil, etc, seems valuable.
> >>
> >> I think that this would be an unnecessary intermediate step that we 
> >> eventually want to migrate away from anyway. We already identified that 
> >> the good solution for deduplication is going to be a skeleton TAG_module, 
> >> so my view is that it is not worth the trouble adding a temporary 
> >> indirection (and a new attribute name)
> >
> > New attribute name?
> 
> What attribute would we put into the TAG_module to refer to the skeleton CU?
> 
> I imagine we'd just match by name in that instance.

That doesn’t generally work. I could have a.c import Foo and b.c import Foo 
-DNDEBUG. If the debugger loads the debug info from the linked .dSYM bundle it 
wouldn’t be able to tell which version of Foo to import. Actually, that’s a 
really good argument why the module configuration needs to sit inside the 
TAG_module.
> 
> (& eventually use a sig8, etc)

Unless we refer to it via ref_sig8 (and disregard collisions, which I think 
might be statistically fine for module configurations), which we already agreed 
to leave for the next revision.

-- adrian
>  
> 
> >
> >> to save 4 bytes in the intermediate step.
> >
> > The debugger's going to need to resolve the skeleton anyway (in the case of 
> > non-AST based debugging) so I'm not sure how much it's an extra step.
> 
> My “intermediate step” was referring to the non-dedup’ed TAG_module that is 
> not (yet) part of the dedup’ed skeleton CU (but will be in the next 
> iteration). I just didn’t think that it is worth to optimize a representation 
> that will be changed soon anyway.
> 
> It doesn't seem to cost much, but I'll give up on this.
>  
> 
> >
> >> I don’t actually think there is anything about the TAG_module design tying 
> >> this to either MachO or dsymutil, but let me know if you feel otherwise.
> >
> > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb story 
> > you probably don't need to consult the skeleton debug info (actually I 
> > forgot, you do - in the case where you need a name from a module that might 
> > be incompatible (it's not referenced directly in this CU)) - pre-dsymutil, 
> > you'd use the ASTs directly, post-dsymutil I expect you'll have inlined all 
> > the debug info so there are no skeletons, etc, if I'm understanding your 
> > design correctly.
> 
> Right, we need all of it :-)
> 
> -- adrian
> 
> >
> > - David
> >
> >>
> >> -- adrian
> >>>
> >>> >
> >>> > If you need submodule import lists, then each DW_AT_module representing 
> >>> > a submodule would have a name (anything else?) and the signature 
> >>> > refering to its module skeleton CU.
> >>>
> >>> What I’m envisioning is
> >>>
> >>> .debug_info:
> >>>   DW_TAG_compile_unit
> >>>     ...
> >>>     DW_TAG_imported_module
> >>>      // import FooSubA
> >>>      DW_AT_import [DW_FORM_ref4] (0x60)
> >>>
> >>>     DW_TAG_module
> >>>       DW_AT_name(“FooLib”)
> >>>       DW_AT_LLVM_sysroot(“/“)
> >>>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>> 0x60:
> >>>       DW_TAG_module
> >>>         DW_AT_name(“FooSubA”)
> >>>         // need not be emitted if not referenced.
> >>>         DW_TAG_module
> >>>           DW_AT_name(“FooSubASubA”)
> >>>
> >>>       // need not be emitted if not referenced.
> >>>       DW_TAG_module
> >>>         DW_AT_name(“FooSubB”)
> >>>
> >>>
> >>>
> >>> -- adrian
> >>
> >>
> >
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to