gve: add flow steering support

Stephen Hemminger Fri, 27 Feb 2026 14:52:52 -0800

On Fri, 27 Feb 2026 19:51:22 +0000
"Jasper Tran O'Leary" <[email protected]> wrote:


> This patch series adds flow steering support to the Google Virtual
> Ethernet (gve) driver. This functionality allows traffic to be directed
> to specific receive queues based on user-specified flow patterns.
> 
> The series includes foundational support for extended admin queue
> commands needed to handle flow rules, the specific adminqueue commands
> for flow rule management, and the integration with the DPDK rte_flow
> API. The series adds support flow matching on the following protocols:
> IPv4, IPv6, TCP, UDP, SCTP, ESP, and AH.
> 
> Patch Overview:
> 
> 1. "net/gve: add flow steering device option" checks for and enables
>    the flow steering capability in the device options during
>    initialization.
> 2. "net/gve: introduce extended adminq command" adds infrastructure
>    for sending extended admin queue commands. These commands use a
>    flexible buffer descriptor format required for flow rule management.
> 3. "net/gve: add adminq commands for flow steering" implements the
>    specific admin queue commands to add and remove flow rules on the
>    device, including handling of rule IDs and parameters.
> 4. "net/gve: add rte flow API integration" exposes the flow steering
>    functionality via the DPDK rte_flow API. This includes strict
>    pattern validation, rule parsing, and lifecycle management (create,
>    destroy, flush). It ensures thread-safe access to the flow subsystem
>    and proper resource cleanup during device reset.
> 
> Jasper Tran O'Leary (2):
>   net/gve: add adminq commands for flow steering
>   net/gve: add rte flow API integration
> 
> Vee Agarwal (2):
>   net/gve: add flow steering device option
>   net/gve: introduce extended adminq command
> 
>  doc/guides/nics/features/gve.ini       |  12 +
>  doc/guides/nics/gve.rst                |  20 +
>  doc/guides/rel_notes/release_26_03.rst |   1 +
>  drivers/net/gve/base/gve.h             |   3 +-
>  drivers/net/gve/base/gve_adminq.c      | 118 ++++-
>  drivers/net/gve/base/gve_adminq.h      |  57 +++
>  drivers/net/gve/gve_ethdev.c           |  87 +++-
>  drivers/net/gve/gve_ethdev.h           |  46 ++
>  drivers/net/gve/gve_flow_rule.c        | 645 +++++++++++++++++++++++++
>  drivers/net/gve/gve_flow_rule.h        |  64 +++
>  drivers/net/gve/meson.build            |   1 +
>  11 files changed, 1049 insertions(+), 5 deletions(-)
>  create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.c
>  create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.h
> 

There is a lot here, so sent AI to take a look.

Summary:
Error 1 — Resource leak: If gve_flush_flow_rules fails and disables the
flow subsystem via gve_clear_flow_subsystem_ok(), the bitmap memory
(avail_flow_rule_bmp_mem) is never freed because both gve_dev_close and
gve_dev_reset gate their teardown on gve_get_flow_subsystem_ok()
returning true. The guard needs to check for allocated memory rather
than subsystem state.

Error 2 — pthread_mutex_init with NULL attributes in shared memory: The
flow_rule_lock lives in dev_private (DPDK shared memory) but is
initialized without PTHREAD_PROCESS_SHARED. This will cause undefined
behavior with secondary processes. Switching to rte_spinlock_t would be
the simplest fix since the critical sections are short (bitmap scan +
TAILQ operations).

Patches 1–3 are clean. The overall structure and error handling in patch 4 is 
solid — the allocation-before-lock pattern, bitmap rollback on adminq failure, 
and defense-in-depth validation in destroy are all well done.
Long form:
# Code Review: GVE Flow Steering Patch Series (4 patches)

**Series**: `[PATCH 1/4]` through `[PATCH 4/4]`
**Author**: Jasper Tran O'Leary / Vee Agarwal (Google)
**Subject**: Add receive flow steering (RFS) support to the GVE driver

---

## Summary

This series adds n-tuple flow steering to the GVE (Google Virtual Ethernet) 
driver via the `rte_flow` API. The implementation is cleanly structured across 
four patches: device option discovery (1/4), extended adminq infrastructure 
(2/4), flow rule adminq commands (3/4), and full rte_flow integration (4/4). 
The code quality is generally good with thorough validation, proper locking, 
and well-structured error handling.

Two correctness issues were identified: a resource leak when the flow subsystem 
is disabled on error, and a missing `PTHREAD_PROCESS_SHARED` attribute on a 
mutex in shared memory.

---

## Patch 1/4: `net/gve: add flow steering device option`

### Errors

None.

### Warnings

None.

This patch is clean. The device option parsing follows the established pattern 
for existing options (modify_ring, jumbo_frames), the byte-swap of 
`max_flow_rules` is correct, and the zero-check on the big-endian value before 
byte-swap is valid (non-zero in any byte order is still non-zero).

---

## Patch 2/4: `net/gve: introduce extended adminq command`

### Errors

None.

### Warnings

None.

The extended command mechanism is straightforward: allocate DMA memory for the 
inner command, copy the command in, set up the outer wrapper with the DMA 
address, execute, and free. The error path is correct — `gve_free_dma_mem` is 
called after `gve_adminq_execute_cmd` regardless of success or failure.

---

## Patch 3/4: `net/gve: add adminq commands for flow steering`

### Errors

None.

### Warnings

**1. `gve_flow_rule.h` copyright appears to be copied from an Intel driver 
file.**

```c
/* SPDX-License-Identifier: BSD-3-Clause
 * Copyright(C) 2022 Intel Corporation
 */
```

This is a new file in the Google GVE driver. The Intel copyright and 2022 date 
look like they were carried over from whichever file was used as a template. 
This should be updated to reflect the actual author.

---

## Patch 4/4: `net/gve: add rte flow API integration`

This is the largest patch (815 lines added) and where the significant findings 
are.

### Errors

**1. Resource leak: bitmap memory leaked when flow subsystem is disabled on 
error.**

In `gve_flush_flow_rules`, if either `gve_free_flow_rules()` or 
`gve_flow_init_bmp()` fails, the code disables the subsystem:

```c
gve_clear_flow_subsystem_ok(priv);
```

However, in both `gve_dev_close` and `gve_dev_reset`, teardown is gated on the 
flag:

```c
if (gve_get_flow_subsystem_ok(priv))
    gve_teardown_flow_subsystem(priv);
```

If the subsystem was disabled by a failed flush, `gve_teardown_flow_subsystem` 
is never called, and `priv->avail_flow_rule_bmp_mem` is never freed. This is a 
memory leak.

**Suggested fix**: Either (a) always call `gve_flow_free_bmp(priv)` in 
close/reset regardless of the flag, or (b) have the flush error path free the 
bitmap memory itself, or (c) unconditionally call teardown in close/reset:

```c
/* In gve_dev_close / gve_dev_reset: */
if (priv->avail_flow_rule_bmp_mem)
    gve_teardown_flow_subsystem(priv);
```

**2. `pthread_mutex_init` without `PTHREAD_PROCESS_SHARED` on a mutex in shared 
memory.**

In `gve_dev_init`:

```c
pthread_mutex_init(&priv->flow_rule_lock, NULL);
```

`priv` is `dev->data->dev_private`, which is allocated in DPDK shared memory 
accessible by both primary and secondary processes. A pthread mutex in shared 
memory initialized with `NULL` attributes has undefined behavior when used 
across processes. It may appear to work in testing but fail in production with 
secondary processes.

**Suggested fix**: Either use `PTHREAD_PROCESS_SHARED`:

```c
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&priv->flow_rule_lock, &attr);
pthread_mutexattr_destroy(&attr);
```

Or switch to `rte_spinlock_t` which works correctly in shared memory without 
special initialization (appropriate here since the critical sections are short 
— bitmap scan, TAILQ insert/remove):

```c
rte_spinlock_t flow_rule_lock;
/* ... */
rte_spinlock_init(&priv->flow_rule_lock);
```

### Warnings

**3. Implicit pointer comparisons throughout `gve_flow_rule.c`.**

Several places use `!pointer` instead of `pointer == NULL`:

```c
if (!flow) {          /* line ~1641, ~1710 */
if (!action->conf) {  /* line ~1510 */
if (!attr) {          /* line ~1173 */
if (!actions) {       /* line ~1498 */
if (!pattern) {       /* line ~1329 */
```

DPDK coding style requires explicit comparison with NULL for pointers. The 
idiomatic form is `if (flow == NULL)`.

**4. `rte_zmalloc` used for flow rule metadata that does not require hugepage 
memory.**

The `gve_flow` structs (8 bytes: a `uint32_t` + TAILQ pointers) are allocated 
via `rte_zmalloc`. These are small control-plane structures not accessed by DMA 
and not requiring NUMA placement. Standard `malloc` would be more appropriate 
per DPDK guidelines and would not consume limited hugepage resources.

Similarly, `priv->avail_flow_rule_bmp_mem` is allocated with `rte_zmalloc`. 
Since `rte_bitmap` may require specific alignment, this one is more defensible, 
but worth considering whether standard allocation would suffice.

**5. Standalone `rte_atomic_thread_fence()` in flow subsystem flag accessors.**

The `gve_get_flow_subsystem_ok` / `gve_set_flow_subsystem_ok` / 
`gve_clear_flow_subsystem_ok` functions use standalone fences with 
`rte_bit_relaxed_*` operations:

```c
static inline bool
gve_get_flow_subsystem_ok(struct gve_priv *priv)
{
    bool ret;
    ret = !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_FLOW_SUBSYSTEM_OK,
                                  &priv->state_flags);
    rte_atomic_thread_fence(rte_memory_order_acquire);
    return ret;
}
```

These follow the existing pattern for other state flags in the driver, so this 
is consistent. However, it's worth noting that in every call site in this 
patch, the flag is checked while holding `flow_rule_lock`, which already 
provides the necessary memory ordering. The fences are redundant in those paths 
(but harmless).

**6. RST documentation uses simple bullet lists where definition lists would be 
cleaner.**

In `doc/guides/nics/gve.rst`:

```rst
Supported Patterns:
  - IPv4/IPv6 source and destination addresses.
  - TCP/UDP/SCTP source and destination ports.
  - ESP/AH SPI.
```

These term+list groupings would read better as RST definition lists. This is 
minor given the lists are short.

### Design Notes (Info)

**Well-structured error handling in `gve_create_flow_rule`**: The 
allocation-before-lock pattern avoids holding the mutex during memory 
allocation, and the `free_flow_and_unlock` goto label correctly handles all 
error paths. The bitmap-set-on-error rollback in the adminq failure case is 
also correct.

**`gve_destroy_flow_rule` double-validates flow state**: The function checks 
both the flow pointer for NULL and verifies `rule_id < max_flow_rules` before 
checking the bitmap. This defense-in-depth is good practice.

**IPv6 word-reversal in `gve_parse_ipv6`**: The comment explaining the device's 
expected word order is clear and the implementation correctly reverses the 
32-bit words. This is the kind of hardware-specific detail that benefits from 
the inline comment.

---

## Cross-Patch Observations

The series is well-ordered: each patch builds incrementally and the 
dependencies flow naturally (device option → extended command infrastructure → 
flow rule commands → rte_flow integration). The documentation, feature matrix, 
and release notes are all updated in patch 4/4 together with the code, which is 
correct per DPDK guidelines.

The `Co-developed-by` / `Signed-off-by` tag sequences are correctly formatted 
per Linux kernel convention.

Re: [PATCH 0/4] net/gve: add flow steering support

Reply via email to