> On May 5, 2015, at 3:14 PM, David Blaikie <[email protected]> wrote:
>
>
>
> On Tue, May 5, 2015 at 3:01 PM, Adrian Prantl <[email protected]
> <mailto:[email protected]>> wrote:
>
> > On May 5, 2015, at 9:59 AM, David Blaikie <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> > On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >>> On May 4, 2015, at 1:31 PM, David Blaikie <[email protected]
> >>> <mailto:[email protected]>> wrote:
> >>>
> >> ...
> >>>> >>> So you're going to need to implement fission (to at least some
> >>>> >>> degree) support in LLDB, then? (to support the case where you
> >>>> >>> haven't linked debug info with llvm-dsymutil, but you've hit one of
> >>>> >>> these lookup problems where you need to cross possibly-conflicting
> >>>> >>> modules)
> >>>> >>
> >>>> >> Yes. Specifically, it won’t support type units, and it will look up
> >>>> >> types by name rather than by signature. (cf. the second part of
> >>>> >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
> >>>> >>
> >>>> >> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>)
> >>>> >
> >>>> > How are you going to reference the types in the module's fission CU
> >>>> > without type units/signatures? Are you going to emit type declarations
> >>>> > into the normal CU and rely on the debugger to know that these
> >>>> > declarations can be resolved by looking elsewhere? (just without the
> >>>> > benefit of constraining that search to just looking for a matching TU?)
> >>>>
> >>>> If you look at the example in
> >>>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
> >>>>
> >>>> <http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html>,
> >>>> there will be an external type index (using the usual accelerator table
> >>>> format) that maps an external type’s UID to a pcm. In the pcm there is
> >>>> an extra accelerator table entry that maps UID to DIE offset.
> >>>
> >>> I mean I guess that's up to you, but seems like a relatively large
> >>> workaround compared to supporting type units... (I mean certainly seems
> >>> like strictly less work to do the workaround than implementing type units
> >>> in LLDB, but a relatively large amount of work to do/throw away
> >>> eventually once LLDB supports type units)
> >>
> >> It’s not primarily meant to be a workaround so LLDB doesn’t need to
> >> implement type units; the UIDs double as the key (decl context + name) to
> >> import types directly from the AST. The other advantage is that we won’t
> >> have to worry about MD5 hash collisions, but that’s more of a theoretical
> >> advantage.
> >
> > If there's a concern about collisions, we should fix that regardless
> > (because we'll have the same problem with normal type units or modularized
> > type units).
>
> My understanding is that split DWARF is disambiguating hash collisions by
> adding the full decl context and name to both the definition and the forward
> declaration:
>
> No, that's not a requirement. It's just an aspect of our implementation so we
> can put entities in the type when needed (such as for putting member function
> declarations in there to refererence from out of line member function
> definitions)
>
> In the simplest case, you can reference a type unit directly:
>
> DW_TAG_variable
> DW_AT_type [DW_FORM_ref_sig8] ....
You’re right. The standard allows this, although all the examples in the
appendix show the long form that LLVM also emits.
>
> Without any DW_TAG_structure_type/namespaces/etc. Once we have Bag O' DWARF,
> we'd move to that representation almost entirely. (the signature we're using
> for this is the mangled name of the type so it includes the decl context in
> the hash anyway)
.. which, incidentally, will look a lot like this UID example, right?
> (for completeness, the UID type reference would look like this:
> DW_TAG_variable
> DW_AT_name “myC”
> DW_AT_type [DW_FORM_strp] “_ZTN1AS1BS1C” )
>
Oh I see, you mean hashing the mangled name and using it as a signature rather
than emitting the name directly. But then, to disambiguate, it would still be
beneficial to have the mangled name available.
>
>
> So I think that hash collisions are actually handled for normal fission and
> we don’t need to worry about them. The bag o’dwarf will still contain the
> full decl context in the DWO, so it will be fine there, too.
>
> >
> > If we've already got (& have to use in the case of DWARF fallback for
> > modules) to support a hash or other numeric identifier, it might be good to
> > use that for everything rather than having two mechanisms?
>
> Generally, one way to do it is of course better than maintaining two :-)
> But using the UIDs in the .o file allows us to elide the the forward
> declarations from the .o file
>
> It doesn't, though - because we need the forward declarations to put members
> into (such as member function definitions - this'll come up even for modules
> quite commonly with any inline member function, for example - its definition
> will appear in the .o file and the definition DW_TAG_subprogram will need to
> DW_AT_specification pointing to the declaration of the member function -
> which is a DW_TAG_subprogram inside the DW_TAG_class that is a declaration
> with a signature that refers to the type unit).
Couldn’t we make the DW_AT_specification an external type reference pointing to
the full type in the module?
>
> Only once we've got Bag O' DWARF do we remove the need for these declarations.
>
> Granted we can get rid of them in /some/ cases today - when there's no need
> to reference members in the type (or nest other members - like a nested
> class, for example - though we could possibly workaround even that with Bag O
> DWARF by having the type in the type unit have a declaration of the nested
> type - then reference that from the nested type definition in another type
> unit or .o file, etc).
>
> I don't think there's anything you can leverage here that's different from
> what type units need. If you're going to do anything different from the way
> type units work today, I'd really like to discuss that further and understand
> what's different here that benefits from/allows for that difference that
> doesn't apply to type units themselves.
>
> (granted the UIDs carry the same information, so it’s not actually as small
> as it looks like) so all that dsymutil needs to do is replace the string
> references with FORM_ref_addrs, no further stripping needed.
Good point. I don’t have any other secret plans up my sleeve apart from what I
already outlined in this thread :-)
I see the UID type references for LLDB like a bag o’dwarf version 0.5. Any
lessons learned there (like the hash collisions) should definitely be addressed
in the final bag o’dwarf design.
>
>
> >
> >>
> >>>
> >>>
> >>> >>
> >>> >>
> >>> >>>
> >>> >>> OK, so I think it's probably reasonable for now to just add
> >>> >>> DW_TAG_modules to the CU for each referenced module (or does it have
> >>> >>> to be each referenced submodule? (can two submodules within a single
> >>> >>> module be contradictory/conflicting?)). Since we don't have any good
> >>> >>> way to reference the module is a foreign unit while deduplicating
> >>> >>> that unit... there's not much point having the imported_module - but
> >>> >>> if you think it adds anything, I'm open to ideas.
> >>> >> It could help keeping things simpler.
> >>> >> Emitting it doesn’t add much semantic value because module imports
> >>> >> always occur at the top level, but it will make the transition to the
> >>> >> deduplicated TAG_modules easier — It could be easier to teach
> >>> >> consumers once about imported_module({ref to TAG_module}) rather than
> >>> >> having them also recognize top-level TAG_modules as an intermediate
> >>> >> step. It’s also slightly easier to implement in LLVM because the
> >>> >> imported_module allows us to anchor the TAG_module in the CU, but
> >>> >> that’s not a very strong argument.
> >>> >
> >>> > Agreed on all counts (not a strong argument, but convenient enough,
> >>> > etc, etc).
> >>> >
> >>> > I'm still not entirely sure what the right answer is here, though,
> >>> > which is why I'm hesitant to bake anything in too strongly.
> >>> >
> >>> > To come back to one of the outstanding questions: Do you need submodule
> >>> > import information, or just module level (if modules cannot have
> >>> > internal conflicts and you can't avoid cross-module conflicts just by
> >>> > lack of visibility (I have no idea if either of those things are true)
> >>> > then you may just need per-module not per-submodule info)?
> >>>
> >>> At the moment I do not think that it makes sense for two submodules to
> >>> conflict, but there is nothing in the clang documentation that explicitly
> >>> forbids this. With this in mind, I think it is reasonable to not support
> >>> submodules (at least initially) and always emit an import for the parent
> >>> module.
> >>> Thats what I wanted to write ... but I as I’m browsing through our
> >>> documentation,
> >>> http://clang.llvm.org/docs/Modules.html#conflict-declarations
> >>> <http://clang.llvm.org/docs/Modules.html#conflict-declarations>
> >>> explicitly gives an example of two conflicting submodules, so maybe this
> >>> is not a reasonable simplification after all. On the other hand, a quick
> >>> grep over all system module maps on OS X doesn’t show a single conflict
> >>> declaration.
> >>>
> >>> I still believe we do not need to support submodules right from the
> >>> start, but we should have a story for getting there if we need to.
> >>>
> >>> Given the simple example that demonstrates the possibility, it seems fair
> >>> to have a story for what that looks like, yes - even if a first
> >>> pass/prototype doesn't support it.
> >>
> >> Sure.
> >>>
> >>>
> >>> >
> >>> > Also, does each submodule need different special attributes/flags? If
> >>> > the special codegen attributes you want are at the module level, it'd
> >>> > probably be best to keep those on the Skeleton CU for the module (that
> >>> > will be comdat folded, etc, on ELF - and they could be DWARF-aware
> >>> > deduplicated by llvm-dsymutil) so they're not duplicated. The
> >>> > DW_TAG_module would then just have a DW_AT_signature attribute or
> >>> > something similarly small/trivial to point to the skeleton CU.
> >>>
> >>> The attributes are derived from cc1 command line arguments. Not two
> >>> submodules imported by one CU can have different attributes. All
> >>> submodules in a pcm also share their attributes. Putting them into the
> >>> skeleton CU appears to be the most efficient place to put them, though
> >>> perhaps not the most logical one.
> >>>
> >>> Why not the most logical? It'd be nice if it were a DW_TAG_module instead
> >>> of a DW_TAG_compile_unit - but given the limited vocabulary we have in
> >>> DWARF top level tags, it seems as good as we can have.
> >>
> >> I tend to view the module configuration (include path, isysroot,
> >> configuration macros) to be a part of the module and not a part of the
> >> skeleton that points to the split debug info for that module.
> >
> > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way or
> > another. Either way the debugger's going to have to jump the
>
> [there’s something missing towards the end]
> No it’s not a workaround. Having the full module configuration is what allows
> LLDB to rebuild the module if the module cache has been cleared.
> >
> >> A module is uniquely identified by name + configuration. That’s why I feel
> >> it should be part of the tag that also holds the name.
> >>
> >>>
> >>> I would prefer to stick the attributes on the (top-level) DW_TAG_module
> >>> and later deduplicate the attributes together with the DW_TAG_module.
> >>> Sticking them on the skeleton won’t save any space in the .o files and
> >>> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot -
> >>> 1x FORM_ref_sig_8) per CU and imported module.
> >>>
> >>> Seems nicer not to duplicate them, especially since not everyone will be
> >>> using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux
> >>> won't have that convenience). Eventually we can use Bag O' DWARF for the
> >>> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow
> >>> that as a top-level tag, if desired/useful - I'm not sure it adds a lot)
> >>> and have the imported_module reference it that way.
> >>> (DW_TAG_imported_module, DW_AT_import, DW_FORM_ref_sig8)
> >>>
> >>> I'm not /hugely/ invested in this, but we do have people caring about
> >>> LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and
> >>> dsymutil, etc, seems valuable.
> >>
> >> I think that this would be an unnecessary intermediate step that we
> >> eventually want to migrate away from anyway. We already identified that
> >> the good solution for deduplication is going to be a skeleton TAG_module,
> >> so my view is that it is not worth the trouble adding a temporary
> >> indirection (and a new attribute name)
> >
> > New attribute name?
>
> What attribute would we put into the TAG_module to refer to the skeleton CU?
>
> I imagine we'd just match by name in that instance.
That doesn’t generally work. I could have a.c import Foo and b.c import Foo
-DNDEBUG. If the debugger loads the debug info from the linked .dSYM bundle it
wouldn’t be able to tell which version of Foo to import. Actually, that’s a
really good argument why the module configuration needs to sit inside the
TAG_module.
>
> (& eventually use a sig8, etc)
Unless we refer to it via ref_sig8 (and disregard collisions, which I think
might be statistically fine for module configurations), which we already agreed
to leave for the next revision.
-- adrian
>
>
> >
> >> to save 4 bytes in the intermediate step.
> >
> > The debugger's going to need to resolve the skeleton anyway (in the case of
> > non-AST based debugging) so I'm not sure how much it's an extra step.
>
> My “intermediate step” was referring to the non-dedup’ed TAG_module that is
> not (yet) part of the dedup’ed skeleton CU (but will be in the next
> iteration). I just didn’t think that it is worth to optimize a representation
> that will be changed soon anyway.
>
> It doesn't seem to cost much, but I'll give up on this.
>
>
> >
> >> I don’t actually think there is anything about the TAG_module design tying
> >> this to either MachO or dsymutil, but let me know if you feel otherwise.
> >
> > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb story
> > you probably don't need to consult the skeleton debug info (actually I
> > forgot, you do - in the case where you need a name from a module that might
> > be incompatible (it's not referenced directly in this CU)) - pre-dsymutil,
> > you'd use the ASTs directly, post-dsymutil I expect you'll have inlined all
> > the debug info so there are no skeletons, etc, if I'm understanding your
> > design correctly.
>
> Right, we need all of it :-)
>
> -- adrian
>
> >
> > - David
> >
> >>
> >> -- adrian
> >>>
> >>> >
> >>> > If you need submodule import lists, then each DW_AT_module representing
> >>> > a submodule would have a name (anything else?) and the signature
> >>> > refering to its module skeleton CU.
> >>>
> >>> What I’m envisioning is
> >>>
> >>> .debug_info:
> >>> DW_TAG_compile_unit
> >>> ...
> >>> DW_TAG_imported_module
> >>> // import FooSubA
> >>> DW_AT_import [DW_FORM_ref4] (0x60)
> >>>
> >>> DW_TAG_module
> >>> DW_AT_name(“FooLib”)
> >>> DW_AT_LLVM_sysroot(“/“)
> >>> DW_AT_LLVM_include_dirs(“-I/path”)
> >>> DW_AT_LLVM_macros(“-DNDEBUG”)
> >>> 0x60:
> >>> DW_TAG_module
> >>> DW_AT_name(“FooSubA”)
> >>> // need not be emitted if not referenced.
> >>> DW_TAG_module
> >>> DW_AT_name(“FooSubASubA”)
> >>>
> >>> // need not be emitted if not referenced.
> >>> DW_TAG_module
> >>> DW_AT_name(“FooSubB”)
> >>>
> >>>
> >>>
> >>> -- adrian
> >>
> >>
> >
>
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits