On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com> wrote: > > Hello, > > At GNU Tools Cauldron this year, some folks were curious to know more on how > the "type representation" in CTF compares vis-a-vis DWARF. > > I use small testcase below to gather some numbers to help drive this > discussion. > > [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c > #define MAX_NUM_MSGS 5 > > enum node_type > { > INIT_TYPE = 0, > COMM_TYPE = 1, > COMP_TYPE = 2, > MSG_TYPE = 3, > RELEASE_TYPE = 4, > MAX_NODE_TYPE > }; > > typedef struct node_payload > { > unsigned short npay_offset; > const char * npay_msg; > unsigned int npay_nelems; > struct node_payload * npay_next; > } node_payload; > > typedef struct node_property > { > int timestamp; > char category; > long initvalue; > } node_property_t; > > typedef struct node > { > enum node_type ntype; > int nmask:5; > union > { > struct node_payload * npayload; > void * nbase; > } nu; > unsigned int msgs[MAX_NUM_MSGS]; > node_property_t node_prop; > } Node; > > Node s; > > int main (void) > { > return 0; > } > > Note that in this case, there is nothing that the de-duplicator has to do > (neither for the TYPE comdat sections nor CTF types). I chose such an example > because de-duplication of types is orthogonal to the concept of representation > of types. > > So, for the small C testcase with a union, enum, array, struct, typedef etc, I > see following sizes : > > Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt): > .debug_aranges 48 0 > .debug_info 150 0 > .debug_abbrev 314 0 > .debug_line 73 0 > .debug_str 455 0 > .debug_ranges 32 0 > .debug_types 578 0 > > Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt): > .debug_aranges 48 0 > .debug_info 732 0 > .debug_abbrev 309 0 > .debug_line 73 0 > .debug_str 455 0 > .debug_rnglists 23 0 > > Compile with -gt (size -A <binary> excerpt): > .ctf 966 0 > CTF strings sub-section size (ctf_strlen in disassmebly) = 374 > == > CTF section just for representing types = 966 - 374 = 592 bytes > (The 592 bytes include the CTF header and other indexes etc.) > > So, following points are what I would highlight. Hopefully this helps you see > that CTF has promise for the task of representing type debug info. > > 1. Type Information layout in sections: > A .ctf section is self-sufficient to represent types in a program. All > references within the CTF section are via either indexes or offsets into > the > CTF section. No relocations are necessary in CTF at this time. In > contrast, > DWARF type information is organized in multiple sections - .debug_info, > .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in > DWARF4. > > 2. Type Information encoding / compactness matters: > Because the type information is organized across sections in DWARF (and > contains some debug information like location etc.) , it is not feasible > to put a distinct number to the size in bytes for representing type > information in DWARF. But the size info of sections shown above should > be helpful to show that CTF does show promise in compactly representing > types. > > Lets see some size data. CTF string table (= 374 bytes) is left out of the > discussion at hand because it will not be fair to compare with .debug_str > section which contains other information than just names of types. > > The 592 bytes of the .ctf section are needed to represent types in CTF > format. Now, when using DWARF5, the type information needs 732 bytes in > .debug_info and 309 bytes in .debug_abbrev. > > In DWARF (when using -fdebug-types-section), the base types are duplicated > across type units. So for the above example, the DWARF DIE representing > 'unsigned int' will appear in both the DWARF trees for types - node and > node_payload. In CTF, there is a single lone type 'unsigned int'.
It's not clear to me why you are using -fdebug-types-section for this comparison? With just -gdwarf-4 I get .debug_info 292 .debug_abbrev 189 .debug_str 299 this contains all the info CTF provides (and more). This sums to 780 bytes, smaller than the CTF variant. I skimmed over the info and there's not much to strip to get to CTF levels, mainly locations. The strings section also has a quite large portion for GCC version and arguments, which is 93 bytes. So overall the DWARF representation should clock in at less than 700 bytes, more close to 650. Richard. > 3. Type Information retrieval and handling: > CTF type information is organized as a linear array of CTF types. CTF > types > have references to other CTF types. libctf facilitates name lookups, i.e. > given the name of the type, get the type information. > > DWARF type information is organized in a tree of DIEs. The information at > the leaf DIEs (base types) across DWARF type units is often duplicated. > DWARF type units do have references to other type units for larger types > though. In the example, the DWARF type unit for node has a reference to > the > DWARF type unit for node_payload. > > I only state the above for sake of observation, I don't know for certain > if > one format is necessarily better or worse for consumers of type debug > information at this time WRT runtime access patterns. > > On a related note though, it's not clear to me how .debug_types > integration > with split-dwarf works out. If the linker does not see the > non-relocation-necessary part of the DWARF, I am not sure how .debug_type > type > units are de-duplicated when using split-dwarf. > > Thanks > Indu >