On Thu, May 4, 2017 at 12:46 PM Robinson, Paul <paul.robin...@sony.com> wrote:
> I think David is correct, that we did not consider LTO and assumed a .dwo > file would have a single compilation unit in the .debug_info section. It > seems to me not hard to fix, but my idea would require an extension to the > package-file index and I don't see provision in the package-file index for > vendor extensions (another oversight?). > (jumping the gun a bit) - that said, an extension to the set of valid columns is backwards compatible-ish (yeah, an existing consumer might error on it, I suppose). But, yeah, might be a good opportunity to revisit & add versioning and vendor extension point, maybe? > In a split-DWARF scenario producing multiple CUs, it's clear that each > split-full unit in the .dwo file would need a corresponding skeleton unit > in the .o file, > Eric was tossing around an idea that would diverge from that - having a single skeleton for the whole DWO file (where the DWO file would contain multiple CUs - though the specifics there we hadn't hashed out - maybe every CU having the same DWO ID or the like), which could reduce the size of the debug info in object files in these situations. (LLVM produces a single debug_addr/debug_ranges/etc in Fission anyway - so every CU in a fission object file would include the same addr_base, ranges_base, the same abbrev offset, etc anyway) But that'd probably be rather invasive a change to Fission? Not sure. > with matching unique DWO IDs. The v5 spec basically already says that. > With multiple split-full units in the same .debug_info section, then > DW_FORM_ref_addr can support cross-CU references within the section; the > producer can supply the correct offset within the section without needing > any relocations. > Yep > How to describe this in the package file? I'd leave DW_SECT_INFO meaning > what it does now—describing the base and size of the individual unit. I'd > add a new "section identifier" DW_SECT_INFO_FILE or whatever, which > describes the base and size of the entire .debug_info section contributed > by the .dwo *file* that the unit came from. This allows a consumer to find > each individual unit by DWO ID, as today, and the extra _FILE column > describes the base-and-size to use when interpreting a DW_FORM_ref_addr > from that unit. For any .dwo file that contains only one unit, > DW_SECT_INFO and DW_SECT_INFO_FILE would have the same values. The tool > that creates the package file can omit DW_SECT_INFO_FILE from the index if > every input .dwo file has only one unit. > Ah - that's a smart idea I hadn't considered & makes a lot of sense. This sounds like it could address my concern/idea around type units too... (will detail that a bit more later) > This solution avoids the problem of the *consumer* having to scan the > .debug_info contribution to find the units; that work can be done once up > front by the packaging tool. > Yep - I don't have a good sense of how expensive such a scan is, but the index is relatively small, I think (would be good to measure what the index looks like for LLVM's ThinLTO which will cause many more CUs to exist (because every primary compilation that imports a few functions from other CUs will get a separate CU for each CU it imports from)) > Section identifiers are 32 bits wide, and the defined values are just 1-8; > surely we can allocate some for vendor extensions! > Seems legit. > And then it's no problem to have tools produce the new column for the > index. Consumers will just ignore section identifiers that they don't > recognize, same as any other part of DWARF. > For sure. > Would that address the problem? > Sounds like it. So here's some extra wrinkles/ideas: Wrinkle 1: I think binutils DWP currently drops duplicate units (units with the same DWO ID). With this change, that wouldn't be possible - or at least all the /bytes/ of the DWO would have to be imported regardless, and in a contiguous chunk, so that cross-CU references would resolve correctly (if you had a DWO with 3 CUs in it, the middle of which turned out to be duplicate & was dropped, then the offset from a DIE in CU 1 to a DIE in CU 3 (now 2) would be broken). I think that's probably OK - such a DWP can only have one entry in the cu_index for that signature - but it can't drop the bytes anymore... *shrug* Wrinkle 2: Type units go in the debug_info section, but you really do want to be able to drop duplicate type units when creating a DWP (that being the point of type units). So maybe require that DWO files have all the CUs first, then the TUs? and the INFO_FILE range only applies to the range over the CUs? This hurts/walks back the unification of CUs and TUs by special casing, unfortunately... - other ideas? Idea: I've been wondering about the idea of not putting types in type units if the producer knows there won't be duplicates (for example Clang's (& GCC's) vtable-based optimization - if a type has a vtable, only put the type definition where the vtable is emitted - well, if the key function is strong, then you know it's going in exactly one place... so why add the overhead of a type unit?). But that makes it awkward for types in type units that want to refer to these ununited types. A simple implementation could produce a declaration of the ununited type in the united type, but that's some overhead - an alternative would be to use ref_addr to refer to the ununited type in the CU - /assuming/ that ref_addr always refers to the debug_info section, not the debug_types section - or in the case of DWP, assuming that ref_addr is resolved relative to the new INFO_FILE range you're proposing (with my ammendment above that it only apply to the (required to be) contiguous range of CUs). - Dave > > --paulr > > > > *From:* Dwarf-Discuss [mailto:dwarf-discuss-boun...@lists.dwarfstd.org] *On > Behalf Of *David Blaikie > *Sent:* Tuesday, May 02, 2017 12:10 PM > *To:* dwarf-discuss@lists.dwarfstd.org > *Subject:* [Dwarf-Discuss] Fission + cross-CU references (ref_addr) > > > > I've recently been trying to resolve the use of Fission in LLVM's ThinLTO > mode (though this would apply to plain LTO too). > > > > One of the things that happens here is that cross-CU DIE references > (DW_FORM_ref_addr) are used to describe inlining a function in one CU into > another CU. > > > > This format has been implemented in LLVM and GCC for ~years and seems to > work well outside of Fission. > > > > So the question is: what to do with Fission? > > > > It seemed to me that a good representation would be to produce multiple > CUs into a single DWO file, which GDB can't yet consume, but I'm working on > patches to help there. DW_FORM_ref_addr would not use any ELF relocation, > but be assumed to be "relative to the chunk of debug_info it was in" > (within the .dwo file) > > > > But what about DWP files? Currently binutils dwp produces records like > this: > > > > (this dwp contains 3 CUs, two from one LTO compile, and one from a > standalone compile linked in for comparison): > > > > Index Signature INFO ABBR LINE STR_OFF > > ----- ------------------ -------- -------- -------- -------- > > 2 0x7bd765349b7e7631 [2d, 65) [38, ae) [11, 22) [14, 3c) > > 8 0x66f4e160661d2687 [00, 2d) [00, 38) [00, 11) [00, 14) > > 11 0x32dd6d7121dd1d9a [65, 98) [38, ae) [11, 22) [14, 3c) > > > > So the ABBR/LINE/STR_OFF sections are kept as-is (no analysis is done to > find which portions of the dwo file are used by which CUs, etc), but the > INFO section is fragmented on the CU boundaries. Fragmenting the TYPES > section on the TU boundaries is necessary/useful for deduplication of > types, but this fragmenting of the CU makes it impossible (I think) to use > ref_addr in a dwp file. > > > > If this fragmenting were not done - consumers (GDB, etc) would need to > change to account for this - searching through the INFO range to find the > CU matching the signature, rather than knowing it starts at the start of > the INFO range. This could have a noticeable performance impact especially > in a full LTO build (where /all/ the CUs were in the same .dwo - so the > index would be entirely unhelpful, I think). > > > > Does all this sound right/sane - anyone have ideas/perspectives/thoughts > on how this should work? > > >
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org