Hi,

Recently I have been working on to generate true signatures
to dwarf in LLVM ([1]). Currently, the dwarf DISubprogram
encodes the source-level signature. But compiler optimization
may modify signatures, e.g., from
  static int foo(int a, int b, int c) { ... }
to
  static int foo(int a, int c) { ... }
where the parameter 'int b' is removed by the compiler.

In most cases signature won't change after
compiler optimization, but some function signatures
indeed get changed. Encoding those changed signatures in
dwarf can help users easily identify true signature and
improve productivity.

Motivation
==========

My particular use case is for bpf-based linux kernel
tracing. When tracing a kernel function, the user would like
to know the actual signature. This is critical.

For example, if the actual signature is
  static int foo(int a, int c) { ... }
and the source signature is
  static int foo(int a, int b, int c) { ... }

If users trace function foo() with signature
  int foo(int a, int b, int c);
then user may get incorrect result as the
above source-level parameter 'int b' actually takes
the value from true-signature 'int c', and the source-level
parameter 'int c' actually takes a garbage value.

In this case, true signature is essential for better and
correct tracing.

The link [2] shows how true signature may be used in pahole
to generate vmlinux BTF which has true signatures.

The link [3] has some history and discussion about what
kind of dwarf format we should take.

Proposed Format
===============

Currently, with [1] the proposed format is

  $ clang -O2 -c -g test.c -mllvm -enable-changed-func-dbinfo
  $ llvm-dwarfdump test.o
  0x0000000c: DW_TAG_compile_unit
                ...
  0x0000005c:   DW_TAG_subprogram
                  DW_AT_low_pc    (0x0000000000000010)
                  DW_AT_high_pc   (0x0000000000000015)
                  DW_AT_frame_base        (DW_OP_reg7 RSP)
                  DW_AT_call_all_calls    (true)
                  DW_AT_name      ("foo")
                  DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c")
                  DW_AT_decl_line (3)
                  DW_AT_prototyped        (true)
                  DW_AT_calling_convention        (DW_CC_nocall)
                  DW_AT_type      (0x000000b1 "char *")

  0x0000006c:     DW_TAG_formal_parameter
                    DW_AT_location        (DW_OP_reg5 RDI)
                    DW_AT_name    ("a")
                    DW_AT_decl_file
("/home/yhs/tests/sig-change/deadarg/test.c")
                    DW_AT_decl_line       (3)
                    DW_AT_type    (0x000000ba "t *")

  0x00000076:     DW_TAG_formal_parameter
                    DW_AT_name    ("b")
                    DW_AT_decl_file
("/home/yhs/tests/sig-change/deadarg/test.c")
                    DW_AT_decl_line       (3)
                    DW_AT_type    (0x000000ce "int")

  0x0000007e:     DW_TAG_formal_parameter
                    DW_AT_location        (DW_OP_reg4 RSI)
                    DW_AT_name    ("d")
                    DW_AT_decl_file
("/home/yhs/tests/sig-change/deadarg/test.c")
                    DW_AT_decl_line       (3)
                    DW_AT_type    (0x000000ba "t *")

  0x00000088:     DW_TAG_call_site
                    ...

  0x0000009d:     NULL
                  ...
  0x000000d2:   DW_TAG_inlined_subroutine
                  DW_AT_name      ("foo")
                  DW_AT_type      (0x000000b1 "char *")
                  DW_AT_artificial        (true)
                  DW_AT_specification     (0x0000005c "foo")

  0x000000dc:     DW_TAG_formal_parameter
                    DW_AT_name    ("a")
                    DW_AT_type    (0x000000ba "t *")

  0x000000e2:     DW_TAG_formal_parameter
                    DW_AT_name    ("d")
                    DW_AT_type    (0x000000ba "t *")

  0x000000e8:     NULL

Basically immediately under tag DW_TAG_compile_unit, tag
DW_TAG_inlined_subroutine
encodes the true signature. The DW_AT_specification will refer to
the actual DISubprogram. The format has been agreed with
Jose Marchesi and David Faust from gcc.

The following is another example:

  $ clang -O2 -c -g test.c -mllvm -enable-changed-func-dbinfo
  $ llvm-dwarfdump test.o
  ...
  0x0000004e:   DW_TAG_subprogram
                  DW_AT_low_pc    (0x0000000000000010)
                  DW_AT_high_pc   (0x0000000000000015)
                  DW_AT_frame_base        (DW_OP_reg7 RSP)
                  DW_AT_call_all_calls    (true)
                  DW_AT_name      ("foo")
                  DW_AT_decl_file ("/home/yhs/tests/sig-change/struct/test.c")
                  DW_AT_decl_line (2)
                  DW_AT_prototyped        (true)
                  DW_AT_calling_convention        (DW_CC_nocall)
                  DW_AT_type      (0x0000006d "long")

  0x0000005e:     DW_TAG_formal_parameter
                    DW_AT_location        (DW_OP_piece 0x8, DW_OP_reg5
RDI, DW_OP_piece 0x8)
                    DW_AT_name    ("arg")
                    DW_AT_decl_file
("/home/yhs/tests/sig-change/struct/test.c")
                    DW_AT_decl_line       (2)
                    DW_AT_type    (0x00000099 "t")

  0x0000006c:     NULL
  ...
  0x00000088:   DW_TAG_inlined_subroutine
                  DW_AT_name      ("foo")
                  DW_AT_type      (0x0000006d "long")
                  DW_AT_artificial        (true)
                  DW_AT_specification     (0x0000004e "foo")

  0x00000092:     DW_TAG_formal_parameter
                    DW_AT_name    ("b")
                    DW_AT_type    (0x0000006d "long")

  0x00000098:     NULL

In this case, the source-level parameter 'arg' is a 16-byte struct
'struct t {long a; long b;};'. But function only uses the second field
of the struct, so the true signature is 'long foo(long b)'. With
true signature, users can easily use it to construct the bpf-based
tracing program.

For the above 'DW_TAG_inlined_subroutine' format is not in dwarf
standard. Suggested by Orlando Cazalet-Hyams and Jeremy Morse (from llvm side),
I would like to get some feedback/opinion from dwarf community about how to
encode changed signatures in dwarf in a better way.

Any suggestions are welcome!

  [1] https://github.com/llvm/llvm-project/pull/165310
  [2] 
https://lore.kernel.org/bpf/[email protected]/
  [3] 
https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609
-- 
Dwarf-discuss mailing list
[email protected]
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss

Reply via email to