On Fri, 26 Jun 2026 at 12:34, Anatoly Burakov <[email protected]> wrote:
>
> When rte_mp_request_async() fails to send requests to all peers,
> copy and param can lose ownership and leak.
>
> However, we cannot simply free them unconditionally, as "partial failure"
> means some requests were already queued and thus still reference `copy` and
> `param`, so freeing them directly on the error path can cause
> use-after-free when those requests are later handled by the async timeout.
>
> Fix this by rolling back queued requests from the current batch, and reset
> nb_sent to 0. Freeing the requests is now safe even if some requests were
> sent, as any responses or timeouts will not find the request ID in the
> queue and will safely exit without doing anything.
>
> Coverity issue: 501503
>
> Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
> Cc: [email protected]
>
> Signed-off-by: Anatoly Burakov <[email protected]>
> ---
> lib/eal/common/eal_common_proc.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/lib/eal/common/eal_common_proc.c
> b/lib/eal/common/eal_common_proc.c
> index 991bf215a3..0cffc7a127 100644
> --- a/lib/eal/common/eal_common_proc.c
> +++ b/lib/eal/common/eal_common_proc.c
> @@ -1245,6 +1245,32 @@ rte_mp_request_async(struct rte_mp_msg *req, const
> struct timespec *ts,
> } else if (mp_request_async(path, copy, param, ts))
> ret = -1;
> }
> +
> + /*
> + * On partial failure, roll back all queued requests. We hold the lock
> + * so no one else touches the queue. All requests in this batch share
> + * the same param pointer. Stale alarms will fire and harmlessly find
> + * nothing via ID-based lookup.
> + */
> + if (ret != 0 && reply->nb_sent > 0) {
> + struct pending_request *r, *next;
> +
> + for (r = TAILQ_FIRST(&pending_requests.requests);
> + r != NULL; r = next) {
> + next = TAILQ_NEXT(r, next);
> + if (r->type == REQUEST_TYPE_ASYNC &&
> + r->async.param == param) {
> + TAILQ_REMOVE(&pending_requests.requests,
> + r, next);
> + free(r->reply);
> + /* r->request == copy, freed below after the
> loop */
> + free(r);
> + }
> + }
> + /* requests on the queue were removed so keep things
> consistent */
> + reply->nb_sent = 0;
> + }
> +
Please, don't reimplement the safe macro.
I plan to update this with:
@@ -1252,15 +1252,11 @@ rte_mp_request_async(struct rte_mp_msg *req,
const struct timespec *ts,
* nothing via ID-based lookup.
*/
if (ret != 0 && reply->nb_sent > 0) {
- struct pending_request *r, *next;
-
- for (r = TAILQ_FIRST(&pending_requests.requests);
- r != NULL; r = next) {
- next = TAILQ_NEXT(r, next);
- if (r->type == REQUEST_TYPE_ASYNC &&
- r->async.param == param) {
- TAILQ_REMOVE(&pending_requests.requests,
- r, next);
+ struct pending_request *r, *tmp;
+
+ RTE_TAILQ_FOREACH_SAFE(r, &pending_requests.requests,
next, tmp) {
+ if (r->type == REQUEST_TYPE_ASYNC &&
r->async.param == param) {
+
TAILQ_REMOVE(&pending_requests.requests, r, next);
free(r->reply);
/* r->request == copy, freed below
after the loop */
free(r);
Objection?
If not, I'll update while applying.
--
David Marchand