On Tue, 12 May 2026 00:53:47 +0000
Mark Blasko <[email protected]> wrote:
> This patch series introduces support for GVE hardware timestamping on DQO
> queues. To support concurrent access, a mutex lock is introduced to protect
> admin queue operations. A mechanism is then added to periodically synchronize
> the NIC clock via AdminQ, and support is introduced for the read_clock ethdev
> operation. Finally, the RX datapath is updated to reconstruct full 64-bit
> timestamps from the 32-bit values in DQO descriptors.
>
> Mark Blasko (6):
> net/gve: add thread safety to admin queue
> net/gve: add device option support for HW timestamps
> net/gve: add AdminQ command for NIC timestamps
> net/gve: add periodic NIC clock synchronization
> net/gve: support read clock ethdev op
> net/gve: reconstruct HW timestamps from DQO
>
> .mailmap | 1 +
> doc/guides/nics/features/gve.ini | 1 +
> doc/guides/nics/gve.rst | 18 +++
> doc/guides/rel_notes/release_26_07.rst | 3 +
> drivers/net/gve/base/gve_adminq.c | 127 +++++++++++++++++----
> drivers/net/gve/base/gve_adminq.h | 29 +++++
> drivers/net/gve/base/gve_desc_dqo.h | 8 +-
> drivers/net/gve/gve_ethdev.c | 148 ++++++++++++++++++++++++-
> drivers/net/gve/gve_ethdev.h | 39 +++++++
> drivers/net/gve/gve_rx_dqo.c | 26 +++++
> 10 files changed, 378 insertions(+), 22 deletions(-)
>
Reran review with new version..
Reviewed the v3 series against current main. Findings on 4/6 and 5/6
below; patches 1, 2, 3, and 6 look good.
[PATCH v3 4/6] net/gve: add periodic NIC clock synchronization
Warning: gve_read_nic_clock() runs as an rte_alarm callback, i.e. on the
shared EAL interrupt thread, and calls gve_adminq_report_nic_timestamp()
-> gve_adminq_kick_and_wait(), which busy-waits via rte_delay_ms() up to
GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK * GVE_ADMINQ_SLEEP_LEN = 100 * 20ms =
2s on an AdminQ timeout (tens of ms in the normal case). Blocking the
interrupt thread that long stalls link-status/reset detection and every
other device's interrupt and alarm handling for the whole process. The
existing gve_check_device_status() alarm only does a single ioread32be(),
so this is new behavior for the driver. Consider running the periodic
read off a dedicated control thread, or otherwise bounding the time spent
on the interrupt thread.
The teardown ordering itself is fine: rte_eal_alarm_cancel() is called
before gve_free_nic_ts_report(), and its spin-until-not-executing
semantics catch the self-rescheduled alarm, since the new entry is queued
during the callback before the dispatcher removes the executing one. No
use-after-free on the memzone there.
[PATCH v3 5/6] net/gve: support read clock ethdev op
Error: priv->nic_ts_lock is locked before it is initialized. In
gve_dev_init() the order is:
err = gve_init_priv(priv, false);
...
pthread_mutex_init(&priv->nic_ts_lock, &mutexattr);
but gve_init_priv() -> gve_setup_device_resources() ->
gve_setup_nic_timestamp() calls gve_read_nic_clock() synchronously when
the device reports NIC-timestamp support, and after this patch
gve_read_nic_clock() takes priv->nic_ts_lock. So on timestamp-capable
hardware the first lock runs on an uninitialized mutex, and the later
pthread_mutex_init() then re-initializes an already-used mutex - both
undefined behavior. It only appears to work because dev_private is zeroed
(a zeroed pthread_mutex_t happens to be a valid default mutex on glibc).
Initialize nic_ts_lock (and the mutexattr) before the gve_init_priv()
call.
Error: priv->nic_ts_lock is destroyed before the alarm that uses it is
cancelled. In gve_dev_close():
pthread_mutex_destroy(&priv->nic_ts_lock);
gve_free_queues(dev);
gve_teardown_device_resources(priv); /* cancels gve_read_nic_clock
alarm */
The periodic gve_read_nic_clock() alarm is still armed when the mutex is
destroyed, and that callback locks nic_ts_lock; if it fires in the window
before gve_teardown_device_resources() cancels it, it locks a destroyed
mutex. Move the pthread_mutex_destroy(&priv->nic_ts_lock) to after
gve_teardown_device_resources() returns.
The two 5/6 errors are the same root cause from opposite ends: the
nic_ts_lock lifetime needs to bracket all its users - initialized before
the synchronous setup-time read, and destroyed only after the alarm is
cancelled.