On Tue, 24 Feb 2026 11:56:47 +0100 Maxime Peim <[email protected]> wrote:
> Add async flow API mode to test-flow-perf application for improved > flow rule insertion performance. The async API allows batching flow > rule creation operations and processing completions in bulk, reducing > per-rule overhead. > > New command line options: > --async: enable async flow API mode > --async-queue-size=N: size of async queues (default: 1024) > --async-push-batch=N: flows to batch before push (default: 256) > > Signed-off-by: Maxime Peim <[email protected]> > --- > app/test-flow-perf/actions_gen.c | 172 +++++++++++++ > app/test-flow-perf/actions_gen.h | 4 + > app/test-flow-perf/async_flow.c | 239 ++++++++++++++++++ > app/test-flow-perf/async_flow.h | 41 ++++ > app/test-flow-perf/items_gen.c | 13 + > app/test-flow-perf/items_gen.h | 4 + > app/test-flow-perf/main.c | 410 ++++++++++++++++++++++++------- > app/test-flow-perf/meson.build | 1 + > 8 files changed, 798 insertions(+), 86 deletions(-) > create mode 100644 app/test-flow-perf/async_flow.c > create mode 100644 app/test-flow-perf/async_flow.h Looked good to me, but AI more detailed dive found some things: The one thing it found was: Race condition on `flow` variable — insert_flows_async() is called concurrently from multiple cores but uses a file-scope static `flow` pointer. One core can overwrite it between async_generate_flow() and the flows_list assignment. Needs a local `struct rte_flow *flow;` declaration. But lots more stuff that you should address: # Review: [PATCH] test/flow: add support for async API **Patch**: test/flow: add support for async API **Author**: Maxime Peim <[email protected]> **Files changed**: 8 (798 insertions, 86 deletions) --- ## Errors ### 1. `alloca()` for `queue_attr_list` allocates wrong size (buffer overflow) In `async_flow.c`, `async_flow_init_port()`: ```c const struct rte_flow_queue_attr **queue_attr_list = alloca(sizeof(struct rte_flow_queue_attr) * nb_queues); ``` This allocates `nb_queues * sizeof(struct rte_flow_queue_attr)` bytes, but `queue_attr_list` is an array of **pointers** (`const struct rte_flow_queue_attr **`). It should allocate `nb_queues * sizeof(struct rte_flow_queue_attr *)`. If `sizeof(struct rte_flow_queue_attr) < sizeof(pointer)`, this is a buffer overflow when iterating `nb_queues` entries. If `sizeof(struct rte_flow_queue_attr) > sizeof(pointer)`, it wastes stack space but is not dangerous. **Fix**: ```c const struct rte_flow_queue_attr **queue_attr_list = alloca(sizeof(struct rte_flow_queue_attr *) * nb_queues); ``` Or more idiomatically: `alloca(sizeof(*queue_attr_list) * nb_queues)`. ### 2. `alloca()` with unbounded `nb_queues` — potential stack overflow `nb_queues` comes from `mc_pool.cores_count` which is user-controlled via `--cores`. While it's capped by `port_info.max_nb_queues` if the device reports it, if `max_nb_queues` is 0 or `UINT32_MAX` (which the code explicitly skips), there's no upper bound enforced before `alloca()`. A large value could blow the stack. **Fix**: Add a reasonable upper bound check before the `alloca()` calls, or use `rte_malloc`/`malloc` instead. ### 3. `flow` variable used in `insert_flows_async()` without declaration The function `insert_flows_async()` in `main.c` references `flow` (e.g., line 994: `flow = generate_flow(...)`, line 1014: `flow = async_generate_flow(...)`) but `flow` is never declared as a local variable within this function. It appears to rely on a file-scope `static struct rte_flow *flow` variable. This creates a hidden data race: if multiple cores call `insert_flows_async()` concurrently (which they do — `flows_handler` is called per-core), they share the same `flow` pointer without synchronization. One core could overwrite `flow` between the `async_generate_flow()` call and the `flows_list[flow_index++] = flow` assignment. **Fix**: Declare `struct rte_flow *flow;` as a local variable inside `insert_flows_async()`. ### 4. Missing NULL check on `flow` before measuring first-flow latency In `insert_flows_async()`: ```c flow = async_generate_flow(...); if (counter == start_counter) { first_flow_latency = ...; printf(":: First Flow Latency ...\n"); } if (force_quit) break; if (!flow) { print_flow_error(error); rte_exit(EXIT_FAILURE, ...); } ``` The `flow == NULL` check happens **after** the first-flow latency measurement and the `force_quit` check. If the first flow creation fails (`flow == NULL`) and `force_quit` is set, the code breaks out of the loop and proceeds to use `flows_list` which has a NULL entry at index 0, potentially causing issues during cleanup. More critically, if it's the first flow and it fails, the code prints a misleading "First Flow installed" latency message before discovering the failure. **Fix**: Move the `if (!flow)` check immediately after `async_generate_flow()`, before the latency print. ### 5. `port_attr.nb_counters` not set — may cause configure failure In `async_flow_init_port()`, `port_attr` is zero-initialized but `nb_counters` and `nb_aging_objects` are not set even though the flow actions may include `COUNT` or metering. Some PMDs require these to be nonzero when counters are used with async flows. This could cause silent failures or `rte_flow_configure` to reject the configuration. This is a moderate-confidence concern (~60%) — it depends on the specific PMD, but the code doesn't attempt to set these fields at all. ### 6. `#include "rte_common.h"` — incorrect include style In `main.c`: ```c #include "rte_common.h" ``` DPDK public headers should be included with angle brackets: `#include <rte_common.h>`. Using quotes searches the local directory first, which is incorrect for a system/library header. This could also fail to compile if the local directory doesn't have a copy. --- ## Warnings ### 7. `memset` on `set_ipv6_mask` inside function body with `static` variable In `fill_actions_template()`: ```c static struct rte_flow_action_set_ipv6 set_ipv6_mask; /* ... */ memset(set_ipv6_mask.ipv6_addr.a, 0xff, 16); ``` The `memset` is called **every time** `fill_actions_template()` is invoked, but the variable is `static`. This is redundant after the first call and suggests the initialization should either be done once (with a static flag or at file scope) or the variable shouldn't be `static`. It's not a bug, but it's wasteful and indicates a design inconsistency — all other static mask variables are initialized at declaration. ### 8. `actions_counter` is `uint8_t` — can overflow with many actions In `fill_actions_template()`, `actions_counter` is declared as `uint8_t`, which limits it to 255 entries. While `MAX_ACTIONS_NUM` is likely smaller, there's no bounds check against the `actions[]` and `masks[]` array sizes. If the template_actions table plus the END sentinel exceeds the caller's array, this silently writes out of bounds. ### 9. Unnecessary `rte_zmalloc` for `results` array in `insert_flows_async()` ```c results = rte_zmalloc("results", sizeof(struct rte_flow_op_result) * async_push_batch, 0); ``` Per AGENTS.md guidelines, `rte_malloc` is for DMA-accessible or shared memory. The `results` array is only used locally by the CPU for pulling flow operation results. Standard `malloc` (or even a stack allocation if `async_push_batch` is bounded) would be more appropriate and wouldn't consume limited hugepage memory. ### 10. Gratuitous reformatting of the `lgopts[]` table The patch reformats the entire existing `lgopts[]` table from `{ "name", ...}` with alignment spaces to `{"name", ...}` without alignment. This changes 60+ lines that have nothing to do with the async feature, making the diff much harder to review and potentially causing merge conflicts with other in-flight patches. This should be a separate cleanup patch, or the new entries should match the existing style. ### 11. `rte_zmalloc` for `flows_list` — unnecessary hugepage usage ```c flows_list = rte_zmalloc("flows_list", (sizeof(struct rte_flow *) * (rules_count_per_core + 1)), 0); ``` This is a pointer array used only for bookkeeping. Standard `calloc` is appropriate here. (Note: the existing `insert_flows()` has the same pattern, so this may be intentional consistency, but it's still worth noting.) ### 12. Division by zero if `cpu_time_used` is zero In `insert_flows_async()`: ```c insertion_rate = ((double)(rules_count_per_core / cpu_time_used) / 1000); ``` If the loop completes very quickly or `rules_count_per_core` is 0, `cpu_time_used` could be 0, causing a floating-point division by zero (infinity/NaN). Additionally, the integer division `rules_count_per_core / cpu_time_used` truncates — it should be `(double)rules_count_per_core / cpu_time_used` to get an accurate rate. ### 13. Commit message body line length The commit body line `"Encapped data is fixed with pattern: ether,ipv4,udp,vxlan"` in the usage() help text additions (within the code) is fine, but the actual commit message body should be checked — the `--async-queue-size=N:` line formatting appears tight but within limits. ### 14. Copyright uses "Mellanox Technologies" — potentially outdated The new file `async_flow.c` and `async_flow.h` use `Copyright 2026 Mellanox Technologies, Ltd`. The author's email is `@gmail.com` (individual), and Mellanox was acquired by NVIDIA in 2020. This is noted only as an observation per AGENTS.md — the copyright holder's choice is not subject to review. --- ## Summary The most critical finding is **#3** — the `flow` variable race condition in the multi-core `insert_flows_async()` path. If multiple cores run this function concurrently (which is the intended use), they would share and clobber a file-scope `flow` pointer. This is a real correctness bug that would cause intermittent failures. The **#1** `alloca()` size mismatch is also a clear bug that could cause memory corruption depending on the struct size vs pointer size relationship on the target architecture. The remaining issues range from stack safety (#2), error handling order (#4), to style/efficiency concerns (#7–#12).

