On 17/5/26 05:33, Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
> 
> Add syntax to the FETCHARGS parsing of probes to allow the use of
> structure and member names to get the offsets to dereference pointers.
> 
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> reference. For example, to get the size of a kmem_cache that was passed to
> the function kmem_cache_alloc_noprof, one would need to do:
> 
>  # cd /sys/kernel/tracing
>  # echo 'f:cache kmem_cache_alloc_noprof size=+0x18($arg1):u32' >> 
> dynamic_events
> 
> This requires knowing that the offset of size is 0x18, which can be found
> with gdb:
> 
>   (gdb) p &((struct kmem_cache *)0)->size
>   $1 = (unsigned int *) 0x18
> 
> If BTF is in the kernel, it can be used to find this with names, where the
> user doesn't need to find the actual offset:
> 
>  # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> 
> dynamic_events
> 
> Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> 
>   +STRUCT.MEMBER[.MEMBER[..]]
> 
> The delimiter is '.' and the first item is the structure name. Then the
> member of the structure to get the offset of. If that member is an
> embedded structure, another '.MEMBER' may be added to get the offset of
> its members with respect to the original value.
> 
>   "+kmem_cache.size($arg1)" is equivalent to:
> 
>   (*(struct kmem_cache *)$arg1).size
> 
> Anonymous structures are also handled:
> 
>   # echo 'e:xmit net.net_dev_xmit 
> +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events
> 
> Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:
> 
>   (*(struct net_device *)((*(struct sk_buff *)($skbaddr)).dev)->name)
> 
> Note that "dev" of struct sk_buff is inside an anonymous structure:
> 
> struct sk_buff {
>       union {
>               struct {
>                       /* These two members must be first to match 
> sk_buff_head. */
>                       struct sk_buff          *next;
>                       struct sk_buff          *prev;
> 
>                       union {
>                               struct net_device       *dev;
>                               [..]
>                       };
>               };
>               [..]
>       };
> 
> This will allow up to three deep of anonymous structures before it will
> fail to find a member.
> 
> The above produces:
> 
>     sshd-session-1080    [000] b..5.  1526.337161: xmit: (net.net_dev_xmit) 
> arg1="enp7s0"
> 
> And nested structures can be found by adding more members to the arg:
> 
>   # echo 'f:read filemap_readahead.isra.0 
> file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> 
> dynamic_events
> 
> The above is equivalent to:
> 
>   *((*(struct dentry *)(*(struct file *)$arg2).f_path.dentry)->d_name.name)
> 
> And produces:
> 
>        trace-cmd-1381    [002] ...1.  2082.676268: read: 
> (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> 
Hi Steve,

Great to see that BTF is going to be nested into trace.

I'm glad to share my BPF tool, bpfsnoop [1], that utilizes the similar
way to inspect argument's data.

Read device name:
bpfsnoop -t net_dev_xmit --output-arg 'str(skb->dev->name)'
--limit-events 20
- net_dev_xmit[tp] args=((struct sk_buff *)skb=0xffff88818821d4e8,
(int)rc=0, (struct net_device *)dev=0xffff88984ba64000, (unsigned
int)skb_len=0x1f2/498) cpu=2 process=(0:swapper/2)
timestamp=18:06:17.309492697
Arg attrs: (array(char[16]))'str(skb->dev->name)'="eth0"

Read dentry name:
bpfsnoop -k 'vfs_read' --output-arg
'str((file->f_path.dentry)->d_name.name)' --limit-events 20
← vfs_read args=((struct file *)file=0xffff888175e08400, (char
*)buf=0x55c7a1168400(0x0/0), (size_t)count=0x10000/65536, (loff_t
*)pos=0xffffc9000f707bb0(0)) retval=(long int)510 cpu=3
process=(339834:sudo) timestamp=18:24:16.22021166
Arg attrs: (unsigned char *)'str((file->f_path.dentry)->d_name.name)'="ptmx"

In bpfsnoop, it provides a friendly way to inspect argument's data using
C expressions. Under the hood, it compiles the C expressions, specified
by --filter-arg/--output-arg, into BPF byte code by parsing the
struct/union member access with BTF. (I'm too lazy to write documents to
explain its internal details. But you can study it with AI assistance.)

Insanely, after developing such feature for bpfsnoop, I wondered whether
to embed a light-weight C compiler into trace tool in order to compile C
expression into BPF byte code, and then load the BPF program to
filter/output argument. Finally, users are able to filter/output
arguments using C expressions. It seemed too crazy for me to post such
idea to trace mailing list at that time, as I wasn't familiar with trace
infrastructure.

[1] https://github.com/bpfsnoop/bpfsnoop/

Thanks,
Leon


Reply via email to