On Thu, May 4, 2017 at 6:22 PM Robinson, Paul <paul.robin...@sony.com> wrote:
> I think it's pretty safe to say: > > - A reference into a TU from a CU or a different TU is invariably > by ref_sig8, never by section offset. > > - A reference into a CU from another CU has to be by ref_addr; in > a .o file this can use a relocation, in a .dwo file it has to be from > inside the same .debug_info contribution. > > - A reference into a CU from a TU is not allowed, even if the TU > lives in the same .debug_info contribution. > > I don't have my hands on words in the document that say these things, but > I am quite sure that's the intent. It's not important whether object-file > mechanics would allow you to do the things that aren't allowed above. > Fair points, all - I'd be curious to see the wording, for sure. From what I could read it certainly seemed implied/assumed, but not explicit. I was mentioning this mostly by way of opportunistically suggesting "it might be a thing that would be good to think explicitly about, possibly allow, and design this new stuff around the possibility, perhaps". > > > The rest of what I'm suggesting all follows (reiterating for clarity): > > - If a .dwo file has multiple split-full CUs, they each have a > unique DWO ID (so the index can describe them individually). > > - Therefore, the corresponding .o has a distinct corresponding > skeleton CU for each split-full CU. > > - Cross-CU references within a .dwo file are by DW_FORM_ref_addr > and the related CUs must be in the same .debug_info contribution. > > - Split-full CUs without cross-CU references can be in separate > .debug_info contributions within the .dwo file. > Adding semantics to "contributions" in the .dwo file seems like a big step that wasn't present before I think. It would mandate the use of (at least 2) sections when using Fission+type units, reducing some compression opportunity. (yeah, in theory, implementation detail - a platform could agree to some other format where, say, .debug_info.dwo starts with an int specifying how long the CU prefix is, after which there are type units) > - A packager should look for multiple CUs in a .debug_info > contribution, be willing to create an index entry for each one, and not > split up the contribution even if one or more of the CUs has already been > included from elsewhere. > But only for CUs, not TUs, right. So as long as the producer used a separate section chunk/"contribution" for the TUs (even in the v5 TU/CU unification into the debug_info section) then the packager could continue to fragment a chunk containing only TUs, but if it contained any CUs that chunk would be indivisible. > - A packager can drop an entire .debug_info contribution if *all* > of the CUs in that contribution have been included from elsewhere. (This > trivially covers the one-CU-per-contribution case.) > > - The package index should get a new column to describe the entire > .debug_info contribution containing the CU, so that consumers can know how > to resolve DW_FORM_ref_addr. > > > > You're probably still thinking of wrinkles I haven't addressed; let me > know. > Not much really - it about covers it, I think it just gets a little hairy in spots mentioned above. - Dave > --paulr > > > > *From:* David Blaikie [mailto:dblai...@gmail.com] > *Sent:* Thursday, May 04, 2017 5:30 PM > *To:* Robinson, Paul; dwarf-discuss@lists.dwarfstd.org; Eric Christopher > *Subject:* Re: [Dwarf-Discuss] Fission + cross-CU references (ref_addr) > > > > > > On Thu, May 4, 2017 at 5:05 PM Robinson, Paul <paul.robin...@sony.com> > wrote: > > Skeleton units are pretty small; it's a 20-byte header, plus values for > the compile_unit DIE, which is spec'd to have no children. I would not be > concerned about space there. And having unique DWO IDs per unit seems > pretty useful. > > I tend to agree, though - what sort of uses do you have in mind? > > > > A unique DWO ID per unit lets each DWO unit have a distinct entry in the > index… saves the consumer the trouble of having to read the .debug_info > section to find the units. > > > Yep > > > If you want to require consumers to do more work, you can make DWO IDs > be per-file instead of per-unit, and then there's no need for an INFO_FILE > column because the INFO column would necessarily have to cover the entire > .debug_info section from that file. > > > Yep - time/space tradeoff, and I'd probably err on the side of time myself > (by having separate skeletons and cu_index entries for each CU) as you've > suggested. Just floating the other as an alternative since it did come up. > > > > > In non-split DWARF, type units are spec'd to have their own object-file > section contributions, separate from the compile unit(s); > > That's sort of an implementation detail though, isn't it? DWARF just talks > about bytes in sections (type units go in the debug_types section (or, now, > the debug_info section)) and, yeah, you can use comdat groups and separate > chunks of debug_types sections to deduplicate them, but I don't think DWARF > requires/speaks about that, does it? > > Actually DWARF 5 Appendix E does describe this; not as a required tactic, > but a way to achieve the useful effect of deduplicating type units etc. So > yes it was overstating the case to say they are _spec'd_ to have their own > contributions, but that would be what a producer would normally do. > > > Ah, cool - thanks for the pointer about where the wording is. > > > > > that's what lets type units have a COMDAT key and be uniqued by the > linker, even though all those separate contributions have the same section > name. (In ELF, you have multiple section headers with the same section > name.) Surely the DWO file could be (is?) done the same way, with each > type unit in its own contribution to the .debug_info section? > > Funny story about that... > > Heh. Which way works better (types all together, or each in their own > contribution) depends on whether your packager wants to deduplicate in a > linker-like way, based on COMDATs, > > > I think neither GCC nor Clang used COMDATs in DWO files - but GCC still > put them in separate sections (sections with the same name... ) - a weird > beast to me, but apparently it's a thing that works. > > But yeah, for non-Fission, COMDATs seem solid, though do represent a > limitation on compressibility, etc. > > > or in a purpose-built way, by looking at the type-unit signatures > directly. DWARF doesn't say you have to do one or the other, which > provides implementation flexibility to the toolchain. If your packager is > willing to look through TUs for signatures and deduplicate that way, then > you can stuff all the TUs into one section contribution and get better > compression. Quality of Implementation, as we like to say. > > > To be sure! > > > > > still there would need to be a special case where the TU's debug_info > chunk would have an INFO_FILE contribution that represente the CU chunk. So > a DWP tool would have to special case the info chunk that contained the CUs > (& would have to require that there be only one if there are to be TU->CU > references. I suppose if TU->CU references aren't supported > > > > I don't think TU->CU references are permitted. > > > For now, with Fission, I agree. > > I think without Fission you could certainly use ref_addr to refer to > something in a CU from a TU - /maybe/ even from a TU to another TU but I > don't think so (not sure if the linker would do the right thing about > reachability, etc - and if your TUs differed in layout, which they can even > in Clang, that wouldn't work out well if it picked a different TU and > either null'd out the ref_addr, or made it refer to the same offset in a > different copy of the type (I don't think any linker/reloc construct would > really result in this latter situation)) > > > Certainly you could not have a v4 split TU referencing a CU, that would be > impossible. > > > > v4 didn't have split things (Fission being a v5 feature, I think), did it? > What's the distinction you're drawing there? > > > > (Without relocations, you can't use DW_FORM_ref_addr to point from > .debug_types to .debug_info; and DW_FORM_ref_sig8 is only for references to > other type units.) While you could engineer the possibility in v5, because > type units have moved back into .debug_info and in principle you could > arrange for DW_FORM_ref_addr to do that, I am morally certain there was no > intent to allow that. > > > Right, I doubt there was any intent - but as we're choosing some new > representations, etc, I'm wondering if it's something to think about. > > Even without the TU->XU reference question, the TU/CU unification still > means that a DWP creation tool would have to special case the CUs in some, > or require the TUs to be placed in separate sections as GCC does it. (then > it could treat each unit section as an indivisible blob) > > Maybe this fits into quality of implementation - but I Think the presence > of cross-unit references makes this a bit more of a matter for the standard > as to how these groups are defined, where cross-CU references are resolved > relative to, how can type units be dropped (or not), etc. > > I'm sort of leaning towards "ref_addr offsets are resolved relative to the > widest range of CUs in a single section that contains the referring DIE" - > though that is a bit of a mouthful/awkward thing to implement. > > - Dave > > > --paulr > >
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org