On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com> wrote:
>
> Hello,
>
> At GNU Tools Cauldron this year, some folks were curious to know more on how
> the "type representation" in CTF compares vis-a-vis DWARF.
>
> I use small testcase below to gather some numbers to help drive this 
> discussion.
>
> [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
> #define MAX_NUM_MSGS 5
>
> enum node_type
> {
>    INIT_TYPE = 0,
>    COMM_TYPE = 1,
>    COMP_TYPE = 2,
>    MSG_TYPE = 3,
>    RELEASE_TYPE = 4,
>    MAX_NODE_TYPE
> };
>
> typedef struct node_payload
> {
>    unsigned short npay_offset;
>    const char * npay_msg;
>    unsigned int npay_nelems;
>    struct node_payload * npay_next;
> } node_payload;
>
> typedef struct node_property
> {
>    int timestamp;
>    char category;
>    long initvalue;
> } node_property_t;
>
> typedef struct node
> {
>    enum node_type ntype;
>    int nmask:5;
>    union
>      {
>        struct node_payload * npayload;
>        void * nbase;
>      } nu;
>      unsigned int msgs[MAX_NUM_MSGS];
>      node_property_t node_prop;
> } Node;
>
> Node s;
>
> int main (void)
> {
>    return 0;
> }
>
> Note that in this case, there is nothing that the de-duplicator has to do
> (neither for the TYPE comdat sections nor CTF types). I chose such an example
> because de-duplication of types is orthogonal to the concept of representation
> of types.
>
> So, for the small C testcase with a union, enum, array, struct, typedef etc, I
> see following sizes :
>
> Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
>      .debug_aranges     48         0
>      .debug_info       150         0
>      .debug_abbrev     314         0
>      .debug_line        73         0
>      .debug_str        455         0
>      .debug_ranges      32         0
>      .debug_types      578         0
>
> Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
>      .debug_aranges      48         0
>      .debug_info        732         0
>      .debug_abbrev      309         0
>      .debug_line         73         0
>      .debug_str         455         0
>      .debug_rnglists     23         0
>
> Compile with -gt (size -A <binary> excerpt):
>      .ctf      966     0
>      CTF strings sub-section size (ctf_strlen in disassmebly) = 374
>      == > CTF section just for representing types = 966 - 374 = 592 bytes
>      (The 592 bytes include the CTF header and other indexes etc.)
>
> So, following points are what I would highlight. Hopefully this helps you see
> that CTF has promise for the task of representing type debug info.
>
> 1. Type Information layout in sections:
>     A .ctf section is self-sufficient to represent types in a program. All
>     references within the CTF section are via either indexes or offsets into 
> the
>     CTF section. No relocations are necessary in CTF at this time. In 
> contrast,
>     DWARF type information is organized in multiple sections - .debug_info,
>     .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
> DWARF4.
>
> 2. Type Information encoding / compactness matters:
>     Because the type information is organized across sections in DWARF (and
>     contains some debug information like location etc.) , it is not feasible
>     to put a distinct number to the size in bytes for representing type
>     information in DWARF. But the size info of sections shown above should
>     be helpful to show that CTF does show promise in compactly representing
>     types.
>
>     Lets see some size data. CTF string table (= 374 bytes) is left out of the
>     discussion at hand because it will not be fair to compare with .debug_str
>     section which contains other information than just names of types.
>
>     The 592 bytes of the .ctf section are needed to represent types in CTF
>     format. Now, when using DWARF5, the type information needs 732 bytes in
>     .debug_info and 309 bytes in .debug_abbrev.
>
>     In DWARF (when using -fdebug-types-section), the base types are duplicated
>     across type units. So for the above example, the DWARF DIE representing
>     'unsigned int' will appear in both the  DWARF trees for types - node and
>     node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info      292
.debug_abbrev 189
.debug_str       299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.

> 3. Type Information retrieval and handling:
>     CTF type information is organized as a linear array of CTF types. CTF 
> types
>     have references to other CTF types. libctf facilitates name lookups, i.e.
>     given the name of the type, get the type information.
>
>     DWARF type information is organized in a tree of DIEs. The information at
>     the leaf DIEs (base types) across DWARF type units is often duplicated.
>     DWARF type units do have references to other type units for larger types
>     though. In the example, the DWARF type unit for node has a reference to 
> the
>     DWARF type unit for node_payload.
>
>     I only state the above for sake of observation, I don't know for certain 
> if
>     one format is necessarily better or worse for consumers of type debug
>     information at this time WRT runtime access patterns.
>
>     On a related note though, it's not clear to me how .debug_types 
> integration
>     with split-dwarf works out. If the linker does not see the
>     non-relocation-necessary part of the DWARF, I am not sure how .debug_type 
> type
>     units are de-duplicated when using split-dwarf.
>
> Thanks
> Indu
>

Reply via email to