From: lilinhu <[email protected]>
flow_dv_counter_free() inserts counters into
pool->counters[pool->query_gen] under pool->csl. Meanwhile,
mlx5_flow_async_pool_query_handle() moves counters from
pool->counters[query_gen ^ 1] to the global free list via
TAILQ_CONCAT while holding only cmng->csl, not pool->csl.
The comment in flow_dv_counter_free() claims the lock is not needed
because the query callback and the release function operate on
different lists. That holds only if the free path always observes
the up-to-date query_gen. It can be violated:
1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
pool->query_gen == 0 and is about to insert into counters[0].
2. The free thread is preempted by the OS scheduler; it is a regular
pthread, not pinned to a core.
3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
query is sent.
4. Hardware completes the query and the callback runs TAILQ_CONCAT on
counters[0] (= query_gen ^ 1).
5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
concurrently with step 4 on another core.
Because the two paths take different locks, TAILQ_INSERT_TAIL and
TAILQ_CONCAT run concurrently on the same list with no
synchronization and corrupt it: the pool-local list ends up with a
NULL head but a dangling tqh_last, and the global free list tail no
longer points to the real tail. The just-freed counter and every
counter inserted afterwards become unreachable and are leaked.
Non-PMD threads can be preempted for hundreds of microseconds under
CPU pressure, which is well within the async query round-trip time,
so the window is reachable in practice.
Fix it by taking pool->csl in the query completion callback before
operating on pool->counters[query_gen], serializing the CONCAT with
any concurrent INSERT. The lock is taken once per pool per query
completion in the eal-intr-thread context, not on the datapath, so
the cost is negligible. Lock order is pool->csl then cmng->csl,
matching all other sites.
Also handle the error path: previously the counters accumulated in
pool->counters[query_gen] were abandoned when a query failed. Move
them back to the global free list to avoid a leak on persistent
query failures.
Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
Cc: [email protected]
Signed-off-by: lilinhu <[email protected]>
---
drivers/net/mlx5/mlx5_flow.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 915ea29a5a..20aad87f5d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9904,6 +9904,20 @@ mlx5_flow_async_pool_query_handle(struct
mlx5_dev_ctx_shared *sh,
if (unlikely(status)) {
raw_to_free = pool->raw_hw;
+ /*
+ * The query failed, so the freed counters accumulated in the
+ * old-gen list during this round would otherwise be stranded.
+ * Move them back to the global free list to avoid a leak when
+ * queries fail persistently.
+ */
+ if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+ rte_spinlock_lock(&pool->csl);
+ rte_spinlock_lock(&cmng->csl[cnt_type]);
+ TAILQ_CONCAT(&cmng->counters[cnt_type],
+ &pool->counters[query_gen], next);
+ rte_spinlock_unlock(&cmng->csl[cnt_type]);
+ rte_spinlock_unlock(&pool->csl);
+ }
} else {
raw_to_free = pool->raw;
if (pool->is_aged)
@@ -9913,11 +9927,20 @@ mlx5_flow_async_pool_query_handle(struct
mlx5_dev_ctx_shared *sh,
rte_spinlock_unlock(&pool->sl);
/* Be sure the new raw counters data is updated in memory. */
rte_io_wmb();
+ /*
+ * A counter free thread may have read a stale query_gen
+ * before the generation was flipped and could still be
+ * inserting into this same old-gen list. Hold pool->csl to
+ * serialize TAILQ_CONCAT with that TAILQ_INSERT_TAIL and
+ * avoid corrupting the list.
+ */
if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+ rte_spinlock_lock(&pool->csl);
rte_spinlock_lock(&cmng->csl[cnt_type]);
TAILQ_CONCAT(&cmng->counters[cnt_type],
&pool->counters[query_gen], next);
rte_spinlock_unlock(&cmng->csl[cnt_type]);
+ rte_spinlock_unlock(&pool->csl);
}
}
LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
--
2.39.3 (Apple Git-146)