On Tue, May 5, 2015 at 3:14 PM David Blaikie <[email protected]> wrote:
> On Tue, May 5, 2015 at 3:01 PM, Adrian Prantl <[email protected]> wrote: > >> >> > On May 5, 2015, at 9:59 AM, David Blaikie <[email protected]> wrote: >> > >> > >> > >> > On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <[email protected]> >> wrote: >> > >> >>> On May 4, 2015, at 1:31 PM, David Blaikie <[email protected]> wrote: >> >>> >> >> ... >> >>>> >>> So you're going to need to implement fission (to at least some >> degree) support in LLDB, then? (to support the case where you haven't >> linked debug info with llvm-dsymutil, but you've hit one of these lookup >> problems where you need to cross possibly-conflicting modules) >> >>>> >> >> >>>> >> Yes. Specifically, it won’t support type units, and it will look >> up types by name rather than by signature. (cf. the second part of >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html >> ) >> >>>> > >> >>>> > How are you going to reference the types in the module's fission >> CU without type units/signatures? Are you going to emit type declarations >> into the normal CU and rely on the debugger to know that these declarations >> can be resolved by looking elsewhere? (just without the benefit of >> constraining that search to just looking for a matching TU?) >> >>>> >> >>>> If you look at the example in >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html, >> there will be an external type index (using the usual accelerator table >> format) that maps an external type’s UID to a pcm. In the pcm there is an >> extra accelerator table entry that maps UID to DIE offset. >> >>> >> >>> I mean I guess that's up to you, but seems like a relatively large >> workaround compared to supporting type units... (I mean certainly seems >> like strictly less work to do the workaround than implementing type units >> in LLDB, but a relatively large amount of work to do/throw away eventually >> once LLDB supports type units) >> >> >> >> It’s not primarily meant to be a workaround so LLDB doesn’t need to >> implement type units; the UIDs double as the key (decl context + name) to >> import types directly from the AST. The other advantage is that we won’t >> have to worry about MD5 hash collisions, but that’s more of a theoretical >> advantage. >> > >> > If there's a concern about collisions, we should fix that regardless >> (because we'll have the same problem with normal type units or modularized >> type units). >> >> My understanding is that split DWARF is disambiguating hash collisions by >> adding the full decl context and name to both the definition and the >> forward declaration: >> > > No, that's not a requirement. It's just an aspect of our implementation so > we can put entities in the type when needed (such as for putting member > function declarations in there to refererence from out of line member > function definitions) > > In the simplest case, you can reference a type unit directly: > > DW_TAG_variable > DW_AT_type [DW_FORM_ref_sig8] .... > > Without any DW_TAG_structure_type/namespaces/etc. Once we have Bag O' > DWARF, we'd move to that representation almost entirely. (the signature > we're using for this is the mangled name of the type so it includes the > decl context in the hash anyway) > > >> >> so if a module defines: >> namespace A { struct B { struct C {} } } >> we’ll end up with >> >> DW_TAG_variable >> DW_AT_name “myC” >> DW_AT_type [DW_FORM_ref4] 0x20 >> >> DW_TAG_namespace >> DW_AT_name “A” >> DW_TAG_structure_type >> DW_AT_name “B” >> 0x20: >> DW_TAG_structure_type >> DW_AT_name “C” >> DW_AT_signature 0x1234ABCD >> DW_AT_declaration >> >> >> and in the DWO there will be: >> >> >> DW_TAG_type_unit 0x1234ABCD >> DW_TAG_namespace >> DW_AT_name “A” >> DW_TAG_structure_type >> DW_AT_name “B” >> >> DW_TAG_structure_type >> DW_AT_name “C” >> ... >> >> ---------- >> >> (for completeness, the UID type reference would look like this: >> DW_TAG_variable >> DW_AT_name “myC” >> DW_AT_type [DW_FORM_strp] “_ZTN1AS1BS1C” ) >> >> >> >> So I think that hash collisions are actually handled for normal fission >> and we don’t need to worry about them. The bag o’dwarf will still contain >> the full decl context in the DWO, so it will be fine there, too. >> >> > >> > If we've already got (& have to use in the case of DWARF fallback for >> modules) to support a hash or other numeric identifier, it might be good to >> use that for everything rather than having two mechanisms? >> >> Generally, one way to do it is of course better than maintaining two :-) >> But using the UIDs in the .o file allows us to elide the the forward >> declarations from the .o file > > > It doesn't, though - because we need the forward declarations to put > members into (such as member function definitions - this'll come up even > for modules quite commonly with any inline member function, for example - > its definition will appear in the .o file and the definition > DW_TAG_subprogram will need to DW_AT_specification pointing to the > declaration of the member function - which is a DW_TAG_subprogram inside > the DW_TAG_class that is a declaration with a signature that refers to the > type unit). > > Only once we've got Bag O' DWARF do we remove the need for these > declarations. > > Granted we can get rid of them in /some/ cases today - when there's no > need to reference members in the type (or nest other members - like a > nested class, for example - though we could possibly workaround even that > with Bag O DWARF by having the type in the type unit have a declaration of > the nested type - then reference that from the nested type definition in > another type unit or .o file, etc). > > I don't think there's anything you can leverage here that's different from > what type units need. If you're going to do anything different from the way > type units work today, I'd really like to discuss that further and > understand what's different here that benefits from/allows for that > difference that doesn't apply to type units themselves. > > >> (granted the UIDs carry the same information, so it’s not actually as >> small as it looks like) so all that dsymutil needs to do is replace the >> string references with FORM_ref_addrs, no further stripping needed. >> >> >> > >> >> >> >>> >> >>> >> >>> >> >> >>> >> >> >>> >>> >> >>> >>> OK, so I think it's probably reasonable for now to just add >> DW_TAG_modules to the CU for each referenced module (or does it have to be >> each referenced submodule? (can two submodules within a single module be >> contradictory/conflicting?)). Since we don't have any good way to reference >> the module is a foreign unit while deduplicating that unit... there's not >> much point having the imported_module - but if you think it adds anything, >> I'm open to ideas. >> >>> >> It could help keeping things simpler. >> >>> >> Emitting it doesn’t add much semantic value because module imports >> always occur at the top level, but it will make the transition to the >> deduplicated TAG_modules easier — It could be easier to teach consumers >> once about imported_module({ref to TAG_module}) rather than having them >> also recognize top-level TAG_modules as an intermediate step. It’s also >> slightly easier to implement in LLVM because the imported_module allows us >> to anchor the TAG_module in the CU, but that’s not a very strong argument. >> >>> > >> >>> > Agreed on all counts (not a strong argument, but convenient enough, >> etc, etc). >> >>> > >> >>> > I'm still not entirely sure what the right answer is here, though, >> which is why I'm hesitant to bake anything in too strongly. >> >>> > >> >>> > To come back to one of the outstanding questions: Do you need >> submodule import information, or just module level (if modules cannot have >> internal conflicts and you can't avoid cross-module conflicts just by lack >> of visibility (I have no idea if either of those things are true) then you >> may just need per-module not per-submodule info)? >> >>> >> >>> At the moment I do not think that it makes sense for two submodules >> to conflict, but there is nothing in the clang documentation that >> explicitly forbids this. With this in mind, I think it is reasonable to not >> support submodules (at least initially) and always emit an import for the >> parent module. >> >>> Thats what I wanted to write ... but I as I’m browsing through our >> documentation, >> http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly >> gives an example of two conflicting submodules, so maybe this is not a >> reasonable simplification after all. On the other hand, a quick grep over >> all system module maps on OS X doesn’t show a single conflict declaration. >> >>> >> >>> I still believe we do not need to support submodules right from the >> start, but we should have a story for getting there if we need to. >> >>> >> >>> Given the simple example that demonstrates the possibility, it seems >> fair to have a story for what that looks like, yes - even if a first >> pass/prototype doesn't support it. >> >> >> >> Sure. >> >>> >> >>> >> >>> > >> >>> > Also, does each submodule need different special attributes/flags? >> If the special codegen attributes you want are at the module level, it'd >> probably be best to keep those on the Skeleton CU for the module (that will >> be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated >> by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then >> just have a DW_AT_signature attribute or something similarly small/trivial >> to point to the skeleton CU. >> >>> >> >>> The attributes are derived from cc1 command line arguments. Not two >> submodules imported by one CU can have different attributes. All submodules >> in a pcm also share their attributes. Putting them into the skeleton CU >> appears to be the most efficient place to put them, though perhaps not the >> most logical one. >> >>> >> >>> Why not the most logical? It'd be nice if it were a DW_TAG_module >> instead of a DW_TAG_compile_unit - but given the limited vocabulary we have >> in DWARF top level tags, it seems as good as we can have. >> >> >> >> I tend to view the module configuration (include path, isysroot, >> configuration macros) to be a part of the module and not a part of the >> skeleton that points to the split debug info for that module. >> > >> > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way >> or another. Either way the debugger's going to have to jump the >> >> [there’s something missing towards the end] >> No it’s not a workaround. Having the full module configuration is what >> allows LLDB to rebuild the module if the module cache has been cleared. >> > >> >> A module is uniquely identified by name + configuration. That’s why I >> feel it should be part of the tag that also holds the name. >> >> >> >>> >> >>> I would prefer to stick the attributes on the (top-level) >> DW_TAG_module and later deduplicate the attributes together with the >> DW_TAG_module. Sticking them on the skeleton won’t save any space in the .o >> files and would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and >> isysroot - 1x FORM_ref_sig_8) per CU and imported module. >> >>> >> >>> Seems nicer not to duplicate them, especially since not everyone will >> be using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux >> won't have that convenience). Eventually we can use Bag O' DWARF for the >> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that >> as a top-level tag, if desired/useful - I'm not sure it adds a lot) and >> have the imported_module reference it that way. (DW_TAG_imported_module, >> DW_AT_import, DW_FORM_ref_sig8) >> >>> >> >>> I'm not /hugely/ invested in this, but we do have people caring about >> LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and >> dsymutil, etc, seems valuable. >> >> >> >> I think that this would be an unnecessary intermediate step that we >> eventually want to migrate away from anyway. We already identified that the >> good solution for deduplication is going to be a skeleton TAG_module, so my >> view is that it is not worth the trouble adding a temporary indirection >> (and a new attribute name) >> > >> > New attribute name? >> >> What attribute would we put into the TAG_module to refer to the skeleton >> CU? >> > > I imagine we'd just match by name in that instance. > > (& eventually use a sig8, etc) > > >> >> > >> >> to save 4 bytes in the intermediate step. >> > >> > The debugger's going to need to resolve the skeleton anyway (in the >> case of non-AST based debugging) so I'm not sure how much it's an extra >> step. >> >> My “intermediate step” was referring to the non-dedup’ed TAG_module that >> is not (yet) part of the dedup’ed skeleton CU (but will be in the next >> iteration). I just didn’t think that it is worth to optimize a >> representation that will be changed soon anyway. >> > > It doesn't seem to cost much, but I'll give up on this. > I'm not so willing to give up so quickly on this. I don't necessarily believe "changed soon" for a lot of things w.r.t. dwarf. :) -eric > > >> >> > >> >> I don’t actually think there is anything about the TAG_module design >> tying this to either MachO or dsymutil, but let me know if you feel >> otherwise. >> > >> > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb >> story you probably don't need to consult the skeleton debug info (actually >> I forgot, you do - in the case where you need a name from a module that >> might be incompatible (it's not referenced directly in this CU)) - >> pre-dsymutil, you'd use the ASTs directly, post-dsymutil I expect you'll >> have inlined all the debug info so there are no skeletons, etc, if I'm >> understanding your design correctly. >> >> Right, we need all of it :-) >> >> -- adrian >> >> > >> > - David >> > >> >> >> >> -- adrian >> >>> >> >>> > >> >>> > If you need submodule import lists, then each DW_AT_module >> representing a submodule would have a name (anything else?) and the >> signature refering to its module skeleton CU. >> >>> >> >>> What I’m envisioning is >> >>> >> >>> .debug_info: >> >>> DW_TAG_compile_unit >> >>> ... >> >>> DW_TAG_imported_module >> >>> // import FooSubA >> >>> DW_AT_import [DW_FORM_ref4] (0x60) >> >>> >> >>> DW_TAG_module >> >>> DW_AT_name(“FooLib”) >> >>> DW_AT_LLVM_sysroot(“/“) >> >>> DW_AT_LLVM_include_dirs(“-I/path”) >> >>> DW_AT_LLVM_macros(“-DNDEBUG”) >> >>> 0x60: >> >>> DW_TAG_module >> >>> DW_AT_name(“FooSubA”) >> >>> // need not be emitted if not referenced. >> >>> DW_TAG_module >> >>> DW_AT_name(“FooSubASubA”) >> >>> >> >>> // need not be emitted if not referenced. >> >>> DW_TAG_module >> >>> DW_AT_name(“FooSubB”) >> >>> >> >>> >> >>> >> >>> -- adrian >> >> >> >> >> > >> >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
