Thanks for reaching out,

I think, at least for myself, there's still a lot of assumed knowledge
about how BPF works that makes this proposal/question hard to follow.

Ideally, it'd be good to have enough information to motivate the discussion
without having to dig too far into the BPF implementation details... but I
know that's a hard balance.

Could you walk through in more detail what BPF tracing is doing, and how it
conflicts with cases where the source signature doesn't match the optimized
signature? It'd be good to see more clearly how the assumption that they do
match causes problems along the way - there might be other solutions to
that problem.

On Thu, Dec 4, 2025 at 8:37 AM Y Song <[email protected]> wrote:

> Hi,
>
> Recently I have been working on to generate true signatures
> to dwarf in LLVM ([1]). Currently, the dwarf DISubprogram
> encodes the source-level signature. But compiler optimization
> may modify signatures, e.g., from
>   static int foo(int a, int b, int c) { ... }
> to
>   static int foo(int a, int c) { ... }
> where the parameter 'int b' is removed by the compiler.
>
> In most cases signature won't change after
> compiler optimization, but some function signatures
> indeed get changed. Encoding those changed signatures in
> dwarf can help users easily identify true signature and
> improve productivity.
>
> Motivation
> ==========
>
> My particular use case is for bpf-based linux kernel
> tracing. When tracing a kernel function, the user would like
> to know the actual signature. This is critical.
>
> For example, if the actual signature is
>   static int foo(int a, int c) { ... }
> and the source signature is
>   static int foo(int a, int b, int c) { ... }
>
> If users trace function foo() with signature
>   int foo(int a, int b, int c);
> then user may get incorrect result as the
> above source-level parameter 'int b' actually takes
> the value from true-signature 'int c', and the source-level
> parameter 'int c' actually takes a garbage value.
>
> In this case, true signature is essential for better and
> correct tracing.
>
> The link [2] shows how true signature may be used in pahole
> to generate vmlinux BTF which has true signatures.
>
> The link [3] has some history and discussion about what
> kind of dwarf format we should take.
>
> Proposed Format
> ===============
>
> Currently, with [1] the proposed format is
>
>   $ clang -O2 -c -g test.c -mllvm -enable-changed-func-dbinfo
>   $ llvm-dwarfdump test.o
>   0x0000000c: DW_TAG_compile_unit
>                 ...
>   0x0000005c:   DW_TAG_subprogram
>                   DW_AT_low_pc    (0x0000000000000010)
>                   DW_AT_high_pc   (0x0000000000000015)
>                   DW_AT_frame_base        (DW_OP_reg7 RSP)
>                   DW_AT_call_all_calls    (true)
>                   DW_AT_name      ("foo")
>                   DW_AT_decl_file
> ("/home/yhs/tests/sig-change/deadarg/test.c")
>                   DW_AT_decl_line (3)
>                   DW_AT_prototyped        (true)
>                   DW_AT_calling_convention        (DW_CC_nocall)
>                   DW_AT_type      (0x000000b1 "char *")
>
>   0x0000006c:     DW_TAG_formal_parameter
>                     DW_AT_location        (DW_OP_reg5 RDI)
>                     DW_AT_name    ("a")
>                     DW_AT_decl_file
> ("/home/yhs/tests/sig-change/deadarg/test.c")
>                     DW_AT_decl_line       (3)
>                     DW_AT_type    (0x000000ba "t *")
>
>   0x00000076:     DW_TAG_formal_parameter
>                     DW_AT_name    ("b")
>                     DW_AT_decl_file
> ("/home/yhs/tests/sig-change/deadarg/test.c")
>                     DW_AT_decl_line       (3)
>                     DW_AT_type    (0x000000ce "int")
>
>   0x0000007e:     DW_TAG_formal_parameter
>                     DW_AT_location        (DW_OP_reg4 RSI)
>                     DW_AT_name    ("d")
>                     DW_AT_decl_file
> ("/home/yhs/tests/sig-change/deadarg/test.c")
>                     DW_AT_decl_line       (3)
>                     DW_AT_type    (0x000000ba "t *")
>
>   0x00000088:     DW_TAG_call_site
>                     ...
>
>   0x0000009d:     NULL
>                   ...
>   0x000000d2:   DW_TAG_inlined_subroutine
>                   DW_AT_name      ("foo")
>                   DW_AT_type      (0x000000b1 "char *")
>                   DW_AT_artificial        (true)
>                   DW_AT_specification     (0x0000005c "foo")
>
>   0x000000dc:     DW_TAG_formal_parameter
>                     DW_AT_name    ("a")
>                     DW_AT_type    (0x000000ba "t *")
>
>   0x000000e2:     DW_TAG_formal_parameter
>                     DW_AT_name    ("d")
>                     DW_AT_type    (0x000000ba "t *")
>
>   0x000000e8:     NULL
>
> Basically immediately under tag DW_TAG_compile_unit, tag
> DW_TAG_inlined_subroutine
> encodes the true signature. The DW_AT_specification will refer to
> the actual DISubprogram. The format has been agreed with
> Jose Marchesi and David Faust from gcc.
>
> The following is another example:
>
>   $ clang -O2 -c -g test.c -mllvm -enable-changed-func-dbinfo
>   $ llvm-dwarfdump test.o
>   ...
>   0x0000004e:   DW_TAG_subprogram
>                   DW_AT_low_pc    (0x0000000000000010)
>                   DW_AT_high_pc   (0x0000000000000015)
>                   DW_AT_frame_base        (DW_OP_reg7 RSP)
>                   DW_AT_call_all_calls    (true)
>                   DW_AT_name      ("foo")
>                   DW_AT_decl_file
> ("/home/yhs/tests/sig-change/struct/test.c")
>                   DW_AT_decl_line (2)
>                   DW_AT_prototyped        (true)
>                   DW_AT_calling_convention        (DW_CC_nocall)
>                   DW_AT_type      (0x0000006d "long")
>
>   0x0000005e:     DW_TAG_formal_parameter
>                     DW_AT_location        (DW_OP_piece 0x8, DW_OP_reg5
> RDI, DW_OP_piece 0x8)
>                     DW_AT_name    ("arg")
>                     DW_AT_decl_file
> ("/home/yhs/tests/sig-change/struct/test.c")
>                     DW_AT_decl_line       (2)
>                     DW_AT_type    (0x00000099 "t")
>
>   0x0000006c:     NULL
>   ...
>   0x00000088:   DW_TAG_inlined_subroutine
>                   DW_AT_name      ("foo")
>                   DW_AT_type      (0x0000006d "long")
>                   DW_AT_artificial        (true)
>                   DW_AT_specification     (0x0000004e "foo")
>
>   0x00000092:     DW_TAG_formal_parameter
>                     DW_AT_name    ("b")
>                     DW_AT_type    (0x0000006d "long")
>
>   0x00000098:     NULL
>
> In this case, the source-level parameter 'arg' is a 16-byte struct
> 'struct t {long a; long b;};'. But function only uses the second field
> of the struct, so the true signature is 'long foo(long b)'. With
> true signature, users can easily use it to construct the bpf-based
> tracing program.
>
> For the above 'DW_TAG_inlined_subroutine' format is not in dwarf
> standard. Suggested by Orlando Cazalet-Hyams and Jeremy Morse (from llvm
> side),
> I would like to get some feedback/opinion from dwarf community about how to
> encode changed signatures in dwarf in a better way.
>
> Any suggestions are welcome!
>
>   [1] https://github.com/llvm/llvm-project/pull/165310
>   [2]
> https://lore.kernel.org/bpf/[email protected]/
>   [3]
> https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609
>
-- 
Dwarf-discuss mailing list
[email protected]
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss

Reply via email to