On Tue, 24 Feb 2026 11:56:47 +0100
Maxime Peim <[email protected]> wrote:

> Add async flow API mode to test-flow-perf application for improved
> flow rule insertion performance. The async API allows batching flow
> rule creation operations and processing completions in bulk, reducing
> per-rule overhead.
> 
> New command line options:
>   --async: enable async flow API mode
>   --async-queue-size=N: size of async queues (default: 1024)
>   --async-push-batch=N: flows to batch before push (default: 256)
> 
> Signed-off-by: Maxime Peim <[email protected]>
> ---
>  app/test-flow-perf/actions_gen.c | 172 +++++++++++++
>  app/test-flow-perf/actions_gen.h |   4 +
>  app/test-flow-perf/async_flow.c  | 239 ++++++++++++++++++
>  app/test-flow-perf/async_flow.h  |  41 ++++
>  app/test-flow-perf/items_gen.c   |  13 +
>  app/test-flow-perf/items_gen.h   |   4 +
>  app/test-flow-perf/main.c        | 410 ++++++++++++++++++++++++-------
>  app/test-flow-perf/meson.build   |   1 +
>  8 files changed, 798 insertions(+), 86 deletions(-)
>  create mode 100644 app/test-flow-perf/async_flow.c
>  create mode 100644 app/test-flow-perf/async_flow.h


Looked good to me, but AI more detailed dive found some things:

The one thing it found was:

Race condition on `flow` variable — insert_flows_async() is called
concurrently from multiple cores but uses a file-scope static `flow`
pointer. One core can overwrite it between async_generate_flow() and
the flows_list assignment. Needs a local `struct rte_flow *flow;`
declaration.

But lots more stuff that you should address:

# Review: [PATCH] test/flow: add support for async API

**Patch**: test/flow: add support for async API
**Author**: Maxime Peim <[email protected]>
**Files changed**: 8 (798 insertions, 86 deletions)

---

## Errors

### 1. `alloca()` for `queue_attr_list` allocates wrong size (buffer overflow)

In `async_flow.c`, `async_flow_init_port()`:

```c
const struct rte_flow_queue_attr **queue_attr_list =
    alloca(sizeof(struct rte_flow_queue_attr) * nb_queues);
```

This allocates `nb_queues * sizeof(struct rte_flow_queue_attr)` bytes, but 
`queue_attr_list` is an array of **pointers** (`const struct 
rte_flow_queue_attr **`). It should allocate `nb_queues * sizeof(struct 
rte_flow_queue_attr *)`. If `sizeof(struct rte_flow_queue_attr) < 
sizeof(pointer)`, this is a buffer overflow when iterating `nb_queues` entries. 
If `sizeof(struct rte_flow_queue_attr) > sizeof(pointer)`, it wastes stack 
space but is not dangerous.

**Fix**:
```c
const struct rte_flow_queue_attr **queue_attr_list =
    alloca(sizeof(struct rte_flow_queue_attr *) * nb_queues);
```

Or more idiomatically: `alloca(sizeof(*queue_attr_list) * nb_queues)`.

### 2. `alloca()` with unbounded `nb_queues` — potential stack overflow

`nb_queues` comes from `mc_pool.cores_count` which is user-controlled via 
`--cores`. While it's capped by `port_info.max_nb_queues` if the device reports 
it, if `max_nb_queues` is 0 or `UINT32_MAX` (which the code explicitly skips), 
there's no upper bound enforced before `alloca()`. A large value could blow the 
stack.

**Fix**: Add a reasonable upper bound check before the `alloca()` calls, or use 
`rte_malloc`/`malloc` instead.

### 3. `flow` variable used in `insert_flows_async()` without declaration

The function `insert_flows_async()` in `main.c` references `flow` (e.g., line 
994: `flow = generate_flow(...)`, line 1014: `flow = async_generate_flow(...)`) 
but `flow` is never declared as a local variable within this function. It 
appears to rely on a file-scope `static struct rte_flow *flow` variable. This 
creates a hidden data race: if multiple cores call `insert_flows_async()` 
concurrently (which they do — `flows_handler` is called per-core), they share 
the same `flow` pointer without synchronization. One core could overwrite 
`flow` between the `async_generate_flow()` call and the 
`flows_list[flow_index++] = flow` assignment.

**Fix**: Declare `struct rte_flow *flow;` as a local variable inside 
`insert_flows_async()`.

### 4. Missing NULL check on `flow` before measuring first-flow latency

In `insert_flows_async()`:

```c
flow = async_generate_flow(...);

if (counter == start_counter) {
    first_flow_latency = ...;
    printf(":: First Flow Latency ...\n");
}

if (force_quit)
    break;

if (!flow) {
    print_flow_error(error);
    rte_exit(EXIT_FAILURE, ...);
}
```

The `flow == NULL` check happens **after** the first-flow latency measurement 
and the `force_quit` check. If the first flow creation fails (`flow == NULL`) 
and `force_quit` is set, the code breaks out of the loop and proceeds to use 
`flows_list` which has a NULL entry at index 0, potentially causing issues 
during cleanup. More critically, if it's the first flow and it fails, the code 
prints a misleading "First Flow installed" latency message before discovering 
the failure.

**Fix**: Move the `if (!flow)` check immediately after `async_generate_flow()`, 
before the latency print.

### 5. `port_attr.nb_counters` not set — may cause configure failure

In `async_flow_init_port()`, `port_attr` is zero-initialized but `nb_counters` 
and `nb_aging_objects` are not set even though the flow actions may include 
`COUNT` or metering. Some PMDs require these to be nonzero when counters are 
used with async flows. This could cause silent failures or `rte_flow_configure` 
to reject the configuration.

This is a moderate-confidence concern (~60%) — it depends on the specific PMD, 
but the code doesn't attempt to set these fields at all.

### 6. `#include "rte_common.h"` — incorrect include style

In `main.c`:
```c
#include "rte_common.h"
```

DPDK public headers should be included with angle brackets: `#include 
<rte_common.h>`. Using quotes searches the local directory first, which is 
incorrect for a system/library header. This could also fail to compile if the 
local directory doesn't have a copy.

---

## Warnings

### 7. `memset` on `set_ipv6_mask` inside function body with `static` variable

In `fill_actions_template()`:
```c
static struct rte_flow_action_set_ipv6 set_ipv6_mask;
/* ... */
memset(set_ipv6_mask.ipv6_addr.a, 0xff, 16);
```

The `memset` is called **every time** `fill_actions_template()` is invoked, but 
the variable is `static`. This is redundant after the first call and suggests 
the initialization should either be done once (with a static flag or at file 
scope) or the variable shouldn't be `static`. It's not a bug, but it's wasteful 
and indicates a design inconsistency — all other static mask variables are 
initialized at declaration.

### 8. `actions_counter` is `uint8_t` — can overflow with many actions

In `fill_actions_template()`, `actions_counter` is declared as `uint8_t`, which 
limits it to 255 entries. While `MAX_ACTIONS_NUM` is likely smaller, there's no 
bounds check against the `actions[]` and `masks[]` array sizes. If the 
template_actions table plus the END sentinel exceeds the caller's array, this 
silently writes out of bounds.

### 9. Unnecessary `rte_zmalloc` for `results` array in `insert_flows_async()`

```c
results = rte_zmalloc("results", sizeof(struct rte_flow_op_result) * 
async_push_batch, 0);
```

Per AGENTS.md guidelines, `rte_malloc` is for DMA-accessible or shared memory. 
The `results` array is only used locally by the CPU for pulling flow operation 
results. Standard `malloc` (or even a stack allocation if `async_push_batch` is 
bounded) would be more appropriate and wouldn't consume limited hugepage memory.

### 10. Gratuitous reformatting of the `lgopts[]` table

The patch reformats the entire existing `lgopts[]` table from `{ "name", ...}` 
with alignment spaces to `{"name", ...}` without alignment. This changes 60+ 
lines that have nothing to do with the async feature, making the diff much 
harder to review and potentially causing merge conflicts with other in-flight 
patches.

This should be a separate cleanup patch, or the new entries should match the 
existing style.

### 11. `rte_zmalloc` for `flows_list` — unnecessary hugepage usage

```c
flows_list = rte_zmalloc("flows_list",
    (sizeof(struct rte_flow *) * (rules_count_per_core + 1)), 0);
```

This is a pointer array used only for bookkeeping. Standard `calloc` is 
appropriate here. (Note: the existing `insert_flows()` has the same pattern, so 
this may be intentional consistency, but it's still worth noting.)

### 12. Division by zero if `cpu_time_used` is zero

In `insert_flows_async()`:
```c
insertion_rate = ((double)(rules_count_per_core / cpu_time_used) / 1000);
```

If the loop completes very quickly or `rules_count_per_core` is 0, 
`cpu_time_used` could be 0, causing a floating-point division by zero 
(infinity/NaN). Additionally, the integer division `rules_count_per_core / 
cpu_time_used` truncates — it should be `(double)rules_count_per_core / 
cpu_time_used` to get an accurate rate.

### 13. Commit message body line length

The commit body line `"Encapped data is fixed with pattern: 
ether,ipv4,udp,vxlan"` in the usage() help text additions (within the code) is 
fine, but the actual commit message body should be checked — the 
`--async-queue-size=N:` line formatting appears tight but within limits.

### 14. Copyright uses "Mellanox Technologies" — potentially outdated

The new file `async_flow.c` and `async_flow.h` use `Copyright 2026 Mellanox 
Technologies, Ltd`. The author's email is `@gmail.com` (individual), and 
Mellanox was acquired by NVIDIA in 2020. This is noted only as an observation 
per AGENTS.md — the copyright holder's choice is not subject to review.

---

## Summary

The most critical finding is **#3** — the `flow` variable race condition in the 
multi-core `insert_flows_async()` path. If multiple cores run this function 
concurrently (which is the intended use), they would share and clobber a 
file-scope `flow` pointer. This is a real correctness bug that would cause 
intermittent failures.

The **#1** `alloca()` size mismatch is also a clear bug that could cause memory 
corruption depending on the struct size vs pointer size relationship on the 
target architecture.

The remaining issues range from stack safety (#2), error handling order (#4), 
to style/efficiency concerns (#7–#12).

Reply via email to