Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [Critical] Lockless access to `drm_ras_node` pointers from the global `drm_ras_xa` XArray leads to Use-After-Free and control-flow hijacking during concurrent device teardown. - [Medium] Missing generation counter and NLM_F_DUMP_INTR flag for lockless Netlink dumps. -- commit 17d70fc870ba118f3745064732f976f0856f2d46 Author: Raag Jadav <[email protected]> drm/ras: Cancel and free message on get counter failure This commit fixes a resource leak in doit_reply_value() where sk_buff and the genetlink header were not cleaned up on a get counter failure. It also consolidates the error handling using a goto pattern. > diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c > index d6eab29a1394c..467a169026fc1 100644 > --- a/drivers/gpu/drm/drm_ras.c > +++ b/drivers/gpu/drm/drm_ras.c > @@ -201,25 +201,28 @@ static int doit_reply_value(struct genl_info *info, u32 > node_id, [ ... ] > ret = get_node_error_counter(node_id, error_id, > &error_name, &value); [Severity: Critical] This is a pre-existing issue, but does lockless access to drm_ras_node pointers from the global drm_ras_xa XArray lead to a use-after-free during concurrent device teardown? The drm_ras netlink operations use .parallel_ops = true, meaning handlers like get_node_error_counter() run concurrently without holding the global genl_lock(). When retrieving a drm_ras_node pointer via xa_load(), the XArray API drops the RCU read lock immediately after the lookup, returning a raw, unprotected pointer: drivers/gpu/drm/drm_ras.c:get_node_error_counter() { node = xa_load(&drm_ras_xa, node_id); if (!node || !node->query_error_counter) return -ENOENT; ... return node->query_error_counter(node, error_id, name, value); } Concurrently, a driver can unregister its nodes during teardown, which removes the node from the XArray and synchronously frees the node memory via DRM's managed cleanup. Could the netlink handler then dereference this freed pointer, calling function pointers read from freed memory? This same pattern appears in drm_ras_nl_list_nodes_dumpit() and get_node_error_threshold(). [Severity: Medium] This is also a pre-existing issue, but are we missing a generation counter and NLM_F_DUMP_INTR flag for lockless Netlink dumps? The Generic Netlink UAPI rules dictate: "If iteration during a dump may skip or repeat objects (e.g. due to lockless data structures), set NLM_F_DUMP_INTR on the affected message(s)." drm_ras_nl_list_nodes_dumpit() and drm_ras_nl_get_error_counter_dumpit() iterate over the lockless drm_ras_xa XArray across potentially multiple netlink continuations: drivers/gpu/drm/drm_ras.c:drm_ras_nl_list_nodes_dumpit() { xa_for_each_start(&drm_ras_xa, id, node, ctx->restart) { ... } } Because drm_ras_xa can be modified concurrently by node registration or unregistration, a multipart dump might skip newly registered elements. Should the implementation maintain a generation counter or set NLM_F_DUMP_INTR on the netlink message to inform userspace of the sequence change? > if (ret) > - return ret; > + goto cancel_msg; > > ret = msg_reply_value(msg, error_id, error_name, value); -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
