On Wed, Jun 3, 2026 at 7:43 PM Yunseong Kim <[email protected]> wrote:
>
> Add tools/kcov-dataflow/ with:
>
> - trigger.c: userspace consumer that opens /sys/kernel/debug/kcov_dataflow,
>   mmaps the buffer, enables recording, triggers a kernel path, and dumps
>   the captured TLV records.
>
> - kcov-view.py: visualization tool that parses and pretty-prints the
>   binary TLV buffer with struct field expansion and symbol resolution.
>
> - eight_args_c/eight_args_mod.c: stress test with 1-8 argument functions
>   verifying correct capture of register and stack-passed arguments.

I think the tests should better go to tools/testing/selftests.

> diff --git a/tools/kcov-dataflow/.gitignore b/tools/kcov-dataflow/.gitignore
> new file mode 100644
> index 000000000000..1f35df8fbd07
> --- /dev/null
> +++ b/tools/kcov-dataflow/.gitignore

I am not sure about the conventions, but a bunch of other .gitignore
files have SPDX headers.

> diff --git a/tools/kcov-dataflow/deep_module/Makefile 
> b/tools/kcov-dataflow/deep_module/Makefile
> new file mode 100644
> index 000000000000..6afed580dc9a
> --- /dev/null
> +++ b/tools/kcov-dataflow/deep_module/Makefile

Makefiles must have SPDX headers.

> diff --git a/tools/kcov-dataflow/deep_module/deep_chain_mod.c 
> b/tools/kcov-dataflow/deep_module/deep_chain_mod.c
> new file mode 100644
> index 000000000000..786e23c5d213
> --- /dev/null
> +++ b/tools/kcov-dataflow/deep_module/deep_chain_mod.c
> @@ -0,0 +1,224 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * deep_chain_mod.c - Demonstrates kcov_dataflow tracing through 10 nested
> + * function calls. An attacker-controlled "offset" value propagates from
> + * the entry point through transformations until it causes an OOB write
> + * in the deepest function.

I don't fully understand the purpose of this file. Are you going to
parse its output for testing?

> + *
> + * Call chain:
> + *   entry_handler → parse_request → validate_header → extract_payload →
> + *   transform_data → apply_filter → compute_index → lookup_slot →
> + *   write_slot → commit_write (BUG: OOB here)

Call me old fashioned, but I think we can easily avoid the non-ASCII
characters here.


> +/* Function 10 (DEEPEST): The vulnerable write */
> +static noinline int commit_write(struct slot_table *table, u32 index, u64 
> value)
> +{
> +       /* BUG: no bounds check on index — if index >= 8, OOB write */
> +       table->slots[index] = value;
> +       return 0;
> +}

Does this call chain need to be this long?
I assume it was extracted from some real-world example, but maybe
pruning it to 3-5 calls would demonstrate the capabilities of dataflow
tracing just as well?


> +
> +       table = kzalloc(sizeof(*table), GFP_KERNEL);
> +       if (!table)
> +               return -ENOMEM;
> +       table->num_slots = 8;
> +
> +       /* The tainted data flow:
Please make sure to conform to
https://docs.kernel.org/process/coding-style.html#commenting

> +
> +/* Trigger: constructs a malicious request that causes index=12 (OOB) */
> +static ssize_t deep_trigger_write(struct file *file, const char __user *ubuf,
> +                                 size_t count, loff_t *ppos)
> +{
> +       u8 *buf;
> +       struct request_header *hdr;
> +       struct payload *pl;
> +
> +       buf = kzalloc(256, GFP_KERNEL);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       /* Craft malicious request */
> +       hdr = (struct request_header *)buf;
> +       hdr->magic = 0x50524F54;       /* valid magic */
> +       hdr->version = 1;              /* valid version */
> +       hdr->payload_offset = 16;      /* offset to payload (valid position) 
> */
> +       hdr->payload_size = sizeof(struct payload);
> +
> +       /* Craft payload that will produce OOB index */
> +       pl = (struct payload *)(buf + 16);
> +       pl->session_id = 0xAAAABBBBCCCCDDDDULL;

Much is happening in this code, making it hard to review, but all
these values do not seem to be used anywhere.


> +
> +noinline u64 func2(u64 a1, u64 a2)
> +{
> +       return a1 + a2;
> +}
> +EXPORT_SYMBOL(func2);
> +
> +noinline u64 func3(u64 a1, u64 a2, u64 a3)
> +{
> +       return a1 + a2 + a3;
> +}
> +EXPORT_SYMBOL(func3);

Would it be more readable to use some macro magic here?

E.g.:
```
#define DEFINE_SUM_FUNC(n, sum_expr, ...)           \
noinline u64 func##n(__VA_ARGS__)               \
{                                               \
return sum_expr;                            \
}                                               \
EXPORT_SYMBOL(func##n)

DEFINE_SUM_FUNC(1, a1, u64 a1);
DEFINE_SUM_FUNC(2, a1 + a2, u64 a1, u64 a2);
DEFINE_SUM_FUNC(3, a1 + a2 + a3, u64 a1, u64 a2, u64 a3);
DEFINE_SUM_FUNC(4, a1 + a2 + a3 + a4, u64 a1, u64 a2, u64 a3, u64 a4);
DEFINE_SUM_FUNC(5, a1 + a2 + a3 + a4 + a5, u64 a1, u64 a2, u64 a3, u64
a4, u64 a5);
DEFINE_SUM_FUNC(6, a1 + a2 + a3 + a4 + a5 + a6, u64 a1, u64 a2, u64
a3, u64 a4, u64 a5, u64 a6);
DEFINE_SUM_FUNC(7, a1 + a2 + a3 + a4 + a5 + a6 + a7, u64 a1, u64 a2,
u64 a3, u64 a4, u64 a5, u64 a6, u64 a7);
DEFINE_SUM_FUNC(8, a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8, u64 a1, u64
a2, u64 a3, u64 a4, u64 a5, u64 a6, u64 a7, u64 a8);
```

You could as well define the function prototypes instead of applying
-Wno-missing-prototypes.


> +{
> +       pr_info("func1(0x11)=0x%llx\n", func1(0x11));
> +       pr_info("func2(0x11,0x22)=0x%llx\n", func2(0x11, 0x22));
> +       pr_info("func3(0x11,0x22,0x33)=0x%llx\n",
> +               func3(0x11, 0x22, 0x33));
> +       pr_info("func4(0x11,..,0x44)=0x%llx\n",
> +               func4(0x11, 0x22, 0x33, 0x44));
> +       pr_info("func5(0x11,..,0x55)=0x%llx\n",
> +               func5(0x11, 0x22, 0x33, 0x44, 0x55));
> +       pr_info("func6(0x11,..,0x66)=0x%llx\n",
> +               func6(0x11, 0x22, 0x33, 0x44, 0x55, 0x66));
> +       pr_info("func7(0x11,..,0x77)=0x%llx\n",
> +               func7(0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77));
> +       pr_info("func8(0x11,..,0x88)=0x%llx\n",
> +               func8(0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88));
> +       return count;
> +}

The inconsistent output format here suggests that the output is never validated.

> +
> +static const struct proc_ops ops = { .proc_write = trigger_write };

This file should belong to debugfs rather than /proc/.

> diff --git a/tools/kcov-dataflow/eight_args_rust/eight_args_rust.rs 
> b/tools/kcov-dataflow/eight_args_rust/eight_args_rust.rs
> new file mode 100644
> index 000000000000..11bbe1449eaf
> --- /dev/null
> +++ b/tools/kcov-dataflow/eight_args_rust/eight_args_rust.rs
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//! Verify kcov_dataflow captures 1-arg through 8-arg functions.
> +//! Write to /sys/kernel/debug/test_args_rust to trigger all 8.

Please make sure triggers for all your tests are consistent,
preferably somewhere under /sys/kernel/debug/kcov_testing/ or similar.



> +
> +Reads both /sys/kernel/debug/kcov (PC trace) and 
> /sys/kernel/debug/kcov_dataflow
> +(args/ret), correlates by PC, and produces a human-readable call trace with
> +argument values and struct field expansion.
> +
> +Usage (inside guest or with appropriate permissions):
> +    python3 kcov-view.py <trigger_command>
> +
> +Example:
> +    python3 kcov-view.py "echo x > /proc/uaf_trigger"
> +
> +Output:
> +    func+0x0 [module]
> +      → a(arg[0]=0x1, arg[1]=0x2, arg[2]=0x3, arg[3]=struct{.f[0]=1, 
> .f[1]=2, .f[2]=3})
> +        ← ret = struct{.f[0]=1, .f[1]=2, .f[2]=3}
> +      → a(arg[0]=0x0, arg[1]=0x0, arg[2]=0x1, arg[3]=NULL)
> +        ← ret = 0x0
> +"""
> +import os, sys, struct, mmap, fcntl, subprocess, re, ctypes
> +from collections import defaultdict
> +
> +# Ioctl definitions (x86_64)

Why is this code x86-specific?

> +def parse_dataflow(buf, n):
> +    """Parse TLV records from kcov_dataflow buffer into a list of events."""
> +    events = []
> +    i = 1
> +    while i <= n and i < BUF_SIZE:
> +        hdr = buf[i]
> +        typ = hdr & 0xF0000000
> +        seq = hdr & 0x00FFFFFF

These numbers are used more than once - please declare them as
constants instead.

> +
> +        if typ not in (0xE0000000, 0xF0000000):
Same here, no one knows what these numbers stand for.


> +        while i <= n and i < BUF_SIZE:
> +            v = buf[i]
> +            vtype = v & 0xF0000000

Please use helper functions to extract the type, size and whatnot.

> +            if vtype == 0xE0000000 or vtype == 0xF0000000:
> +                break
> +            fields.append(v)
> +            i += 1
> +
> +        if typ == 0xE0000000:
> +            arg_idx = (meta >> 56) & 0xFF
> +            arg_sz = (meta >> 48) & 0xFF
> +            ptr = meta & 0xFFFFFFFFFFFF
> +            events.append({
> +                "type": "entry", "seq": seq, "pc": pc,
> +                "arg_idx": arg_idx, "arg_size": arg_sz,
> +                "ptr": ptr, "fields": fields
> +            })

This looks like it could benefit from OOP.


> diff --git a/tools/kcov-dataflow/trigger.c b/tools/kcov-dataflow/trigger.c
> new file mode 100644
> index 000000000000..7fa7b4414770
> --- /dev/null
> +++ b/tools/kcov-dataflow/trigger.c
> @@ -0,0 +1,125 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trigger.c - Uses /sys/kernel/debug/kcov_dataflow to capture

This code seems to partially duplicate kcov-view.py, do you need both?

> +#define COVER_SIZE (64 * 1024)  /* 64K u64 words = 512KB */
> +
> +static void dump_buffer(uint64_t *cover, uint64_t n)
> +{
> +       uint64_t i = 1;
> +
> +       printf("=== KCOV Dataflow TLV Dump (%lu words) ===\n", n);
> +       while (i <= n && i < COVER_SIZE) {
> +               uint64_t hdr = cover[i];
> +               uint64_t type = hdr & 0xF0000000ULL;

Maybe we could use unions to unpack the header?
I don't mind the masks either, but please declare them as constants.


> +int main(int argc, char **argv)
> +{
> +       const char *trigger_path = "/proc/uaf_trigger";

I couldn't find /proc/uaf_trigger anywhere, does it belong to this series?
If trigger.c is a general-purpose tool rather than a test script, it
shouldn't depend on the test modules.





-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Reply via email to