gve: add flow steering support

Jasper Tran O'Leary Mon, 02 Mar 2026 17:01:24 -0800

Thanks for the notes, I've addressed the issues as follows and submitted a
v2 revision.


[Patch 3/4]

1. Updated copyright header in `gve_flow_rule.h`.

[Patch 4/4]

0. Similar to patch 3, updated the copyright header in `gve_flow_rule.c`.
1. Fixed the memory leak in rule flush by freeing the bitmap on flush
failure. This preserves the invariant where `gve_get_flow_subsystem_okay`
is true if and only if the flow ID bitmap and flow rule tail queue are
initialized.
2. Added a `mutexattr_t` with `PTHREAD_PROCESS_SHARED` to the flow mutex
initialization.
3. Changed the aforementioned instances of `(!pointer)` and a few others to
use `pointer == NULL`.
4. Retained the use of `rte_zmalloc` for both `struct gve_flow` and
`priv->avail_flow_rule_bmp_mem` because this memory is shared between
processes.
5. Retained memory barriers in case we call the flag checks from
non-critical paths in the future.
6. Updated the documentation to use a definition list for supported
protocols, and made minor formatting adjustments to make the definition
list look more natural in the flow steering subsection.

On Fri, Feb 27, 2026 at 2:52 PM Stephen Hemminger <
[email protected]> wrote:

> On Fri, 27 Feb 2026 19:51:22 +0000
> "Jasper Tran O'Leary" <[email protected]> wrote:
>
> > This patch series adds flow steering support to the Google Virtual
> > Ethernet (gve) driver. This functionality allows traffic to be directed
> > to specific receive queues based on user-specified flow patterns.
> >
> > The series includes foundational support for extended admin queue
> > commands needed to handle flow rules, the specific adminqueue commands
> > for flow rule management, and the integration with the DPDK rte_flow
> > API. The series adds support flow matching on the following protocols:
> > IPv4, IPv6, TCP, UDP, SCTP, ESP, and AH.
> >
> > Patch Overview:
> >
> > 1. "net/gve: add flow steering device option" checks for and enables
> >    the flow steering capability in the device options during
> >    initialization.
> > 2. "net/gve: introduce extended adminq command" adds infrastructure
> >    for sending extended admin queue commands. These commands use a
> >    flexible buffer descriptor format required for flow rule management.
> > 3. "net/gve: add adminq commands for flow steering" implements the
> >    specific admin queue commands to add and remove flow rules on the
> >    device, including handling of rule IDs and parameters.
> > 4. "net/gve: add rte flow API integration" exposes the flow steering
> >    functionality via the DPDK rte_flow API. This includes strict
> >    pattern validation, rule parsing, and lifecycle management (create,
> >    destroy, flush). It ensures thread-safe access to the flow subsystem
> >    and proper resource cleanup during device reset.
> >
> > Jasper Tran O'Leary (2):
> >   net/gve: add adminq commands for flow steering
> >   net/gve: add rte flow API integration
> >
> > Vee Agarwal (2):
> >   net/gve: add flow steering device option
> >   net/gve: introduce extended adminq command
> >
> >  doc/guides/nics/features/gve.ini       |  12 +
> >  doc/guides/nics/gve.rst                |  20 +
> >  doc/guides/rel_notes/release_26_03.rst |   1 +
> >  drivers/net/gve/base/gve.h             |   3 +-
> >  drivers/net/gve/base/gve_adminq.c      | 118 ++++-
> >  drivers/net/gve/base/gve_adminq.h      |  57 +++
> >  drivers/net/gve/gve_ethdev.c           |  87 +++-
> >  drivers/net/gve/gve_ethdev.h           |  46 ++
> >  drivers/net/gve/gve_flow_rule.c        | 645 +++++++++++++++++++++++++
> >  drivers/net/gve/gve_flow_rule.h        |  64 +++
> >  drivers/net/gve/meson.build            |   1 +
> >  11 files changed, 1049 insertions(+), 5 deletions(-)
> >  create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.c
> >  create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.h
> >
>
> There is a lot here, so sent AI to take a look.
>
> Summary:
> Error 1 — Resource leak: If gve_flush_flow_rules fails and disables the
> flow subsystem via gve_clear_flow_subsystem_ok(), the bitmap memory
> (avail_flow_rule_bmp_mem) is never freed because both gve_dev_close and
> gve_dev_reset gate their teardown on gve_get_flow_subsystem_ok()
> returning true. The guard needs to check for allocated memory rather
> than subsystem state.
>
> Error 2 — pthread_mutex_init with NULL attributes in shared memory: The
> flow_rule_lock lives in dev_private (DPDK shared memory) but is
> initialized without PTHREAD_PROCESS_SHARED. This will cause undefined
> behavior with secondary processes. Switching to rte_spinlock_t would be
> the simplest fix since the critical sections are short (bitmap scan +
> TAILQ operations).
>
> Patches 1–3 are clean. The overall structure and error handling in patch 4
> is solid — the allocation-before-lock pattern, bitmap rollback on adminq
> failure, and defense-in-depth validation in destroy are all well done.
> Long form:
> # Code Review: GVE Flow Steering Patch Series (4 patches)
>
> **Series**: `[PATCH 1/4]` through `[PATCH 4/4]`
> **Author**: Jasper Tran O'Leary / Vee Agarwal (Google)
> **Subject**: Add receive flow steering (RFS) support to the GVE driver
>
> ---
>
> ## Summary
>
> This series adds n-tuple flow steering to the GVE (Google Virtual
> Ethernet) driver via the `rte_flow` API. The implementation is cleanly
> structured across four patches: device option discovery (1/4), extended
> adminq infrastructure (2/4), flow rule adminq commands (3/4), and full
> rte_flow integration (4/4). The code quality is generally good with
> thorough validation, proper locking, and well-structured error handling.
>
> Two correctness issues were identified: a resource leak when the flow
> subsystem is disabled on error, and a missing `PTHREAD_PROCESS_SHARED`
> attribute on a mutex in shared memory.
>
> ---
>
> ## Patch 1/4: `net/gve: add flow steering device option`
>
> ### Errors
>
> None.
>
> ### Warnings
>
> None.
>
> This patch is clean. The device option parsing follows the established
> pattern for existing options (modify_ring, jumbo_frames), the byte-swap of
> `max_flow_rules` is correct, and the zero-check on the big-endian value
> before byte-swap is valid (non-zero in any byte order is still non-zero).
>
> ---
>
> ## Patch 2/4: `net/gve: introduce extended adminq command`
>
> ### Errors
>
> None.
>
> ### Warnings
>
> None.
>
> The extended command mechanism is straightforward: allocate DMA memory for
> the inner command, copy the command in, set up the outer wrapper with the
> DMA address, execute, and free. The error path is correct —
> `gve_free_dma_mem` is called after `gve_adminq_execute_cmd` regardless of
> success or failure.
>
> ---
>
> ## Patch 3/4: `net/gve: add adminq commands for flow steering`
>
> ### Errors
>
> None.
>
> ### Warnings
>
> **1. `gve_flow_rule.h` copyright appears to be copied from an Intel driver
> file.**
>
> ```c
> /* SPDX-License-Identifier: BSD-3-Clause
>  * Copyright(C) 2022 Intel Corporation
>  */
> ```
>
> This is a new file in the Google GVE driver. The Intel copyright and 2022
> date look like they were carried over from whichever file was used as a
> template. This should be updated to reflect the actual author.
>
> ---
>
> ## Patch 4/4: `net/gve: add rte flow API integration`
>
> This is the largest patch (815 lines added) and where the significant
> findings are.
>
> ### Errors
>
> **1. Resource leak: bitmap memory leaked when flow subsystem is disabled
> on error.**
>
> In `gve_flush_flow_rules`, if either `gve_free_flow_rules()` or
> `gve_flow_init_bmp()` fails, the code disables the subsystem:
>
> ```c
> gve_clear_flow_subsystem_ok(priv);
> ```
>
> However, in both `gve_dev_close` and `gve_dev_reset`, teardown is gated on
> the flag:
>
> ```c
> if (gve_get_flow_subsystem_ok(priv))
>     gve_teardown_flow_subsystem(priv);
> ```
>
> If the subsystem was disabled by a failed flush,
> `gve_teardown_flow_subsystem` is never called, and
> `priv->avail_flow_rule_bmp_mem` is never freed. This is a memory leak.
>
> **Suggested fix**: Either (a) always call `gve_flow_free_bmp(priv)` in
> close/reset regardless of the flag, or (b) have the flush error path free
> the bitmap memory itself, or (c) unconditionally call teardown in
> close/reset:
>
> ```c
> /* In gve_dev_close / gve_dev_reset: */
> if (priv->avail_flow_rule_bmp_mem)
>     gve_teardown_flow_subsystem(priv);
> ```
>
> **2. `pthread_mutex_init` without `PTHREAD_PROCESS_SHARED` on a mutex in
> shared memory.**
>
> In `gve_dev_init`:
>
> ```c
> pthread_mutex_init(&priv->flow_rule_lock, NULL);
> ```
>
> `priv` is `dev->data->dev_private`, which is allocated in DPDK shared
> memory accessible by both primary and secondary processes. A pthread mutex
> in shared memory initialized with `NULL` attributes has undefined behavior
> when used across processes. It may appear to work in testing but fail in
> production with secondary processes.
>
> **Suggested fix**: Either use `PTHREAD_PROCESS_SHARED`:
>
> ```c
> pthread_mutexattr_t attr;
> pthread_mutexattr_init(&attr);
> pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> pthread_mutex_init(&priv->flow_rule_lock, &attr);
> pthread_mutexattr_destroy(&attr);
> ```
>
> Or switch to `rte_spinlock_t` which works correctly in shared memory
> without special initialization (appropriate here since the critical
> sections are short — bitmap scan, TAILQ insert/remove):
>
> ```c
> rte_spinlock_t flow_rule_lock;
> /* ... */
> rte_spinlock_init(&priv->flow_rule_lock);
> ```
>
> ### Warnings
>
> **3. Implicit pointer comparisons throughout `gve_flow_rule.c`.**
>
> Several places use `!pointer` instead of `pointer == NULL`:
>
> ```c
> if (!flow) {          /* line ~1641, ~1710 */
> if (!action->conf) {  /* line ~1510 */
> if (!attr) {          /* line ~1173 */
> if (!actions) {       /* line ~1498 */
> if (!pattern) {       /* line ~1329 */
> ```
>
> DPDK coding style requires explicit comparison with NULL for pointers. The
> idiomatic form is `if (flow == NULL)`.
>
> **4. `rte_zmalloc` used for flow rule metadata that does not require
> hugepage memory.**
>
> The `gve_flow` structs (8 bytes: a `uint32_t` + TAILQ pointers) are
> allocated via `rte_zmalloc`. These are small control-plane structures not
> accessed by DMA and not requiring NUMA placement. Standard `malloc` would
> be more appropriate per DPDK guidelines and would not consume limited
> hugepage resources.
>
> Similarly, `priv->avail_flow_rule_bmp_mem` is allocated with
> `rte_zmalloc`. Since `rte_bitmap` may require specific alignment, this one
> is more defensible, but worth considering whether standard allocation would
> suffice.
>
> **5. Standalone `rte_atomic_thread_fence()` in flow subsystem flag
> accessors.**
>
> The `gve_get_flow_subsystem_ok` / `gve_set_flow_subsystem_ok` /
> `gve_clear_flow_subsystem_ok` functions use standalone fences with
> `rte_bit_relaxed_*` operations:
>
> ```c
> static inline bool
> gve_get_flow_subsystem_ok(struct gve_priv *priv)
> {
>     bool ret;
>     ret = !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_FLOW_SUBSYSTEM_OK,
>                                   &priv->state_flags);
>     rte_atomic_thread_fence(rte_memory_order_acquire);
>     return ret;
> }
> ```
>
> These follow the existing pattern for other state flags in the driver, so
> this is consistent. However, it's worth noting that in every call site in
> this patch, the flag is checked while holding `flow_rule_lock`, which
> already provides the necessary memory ordering. The fences are redundant in
> those paths (but harmless).
>
> **6. RST documentation uses simple bullet lists where definition lists
> would be cleaner.**
>
> In `doc/guides/nics/gve.rst`:
>
> ```rst
> Supported Patterns:
>   - IPv4/IPv6 source and destination addresses.
>   - TCP/UDP/SCTP source and destination ports.
>   - ESP/AH SPI.
> ```
>
> These term+list groupings would read better as RST definition lists. This
> is minor given the lists are short.
>
> ### Design Notes (Info)
>
> **Well-structured error handling in `gve_create_flow_rule`**: The
> allocation-before-lock pattern avoids holding the mutex during memory
> allocation, and the `free_flow_and_unlock` goto label correctly handles all
> error paths. The bitmap-set-on-error rollback in the adminq failure case is
> also correct.
>
> **`gve_destroy_flow_rule` double-validates flow state**: The function
> checks both the flow pointer for NULL and verifies `rule_id <
> max_flow_rules` before checking the bitmap. This defense-in-depth is good
> practice.
>
> **IPv6 word-reversal in `gve_parse_ipv6`**: The comment explaining the
> device's expected word order is clear and the implementation correctly
> reverses the 32-bit words. This is the kind of hardware-specific detail
> that benefits from the inline comment.
>
> ---
>
> ## Cross-Patch Observations
>
> The series is well-ordered: each patch builds incrementally and the
> dependencies flow naturally (device option → extended command
> infrastructure → flow rule commands → rte_flow integration). The
> documentation, feature matrix, and release notes are all updated in patch
> 4/4 together with the code, which is correct per DPDK guidelines.
>
> The `Co-developed-by` / `Signed-off-by` tag sequences are correctly
> formatted per Linux kernel convention.
>

Re: [PATCH 0/4] net/gve: add flow steering support

Reply via email to