Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

New issues:
- [Medium] Multicast events are restricted to `init_net`, silently dropping 
events for listeners in other network namespaces despite `.netnsok = true`.

Pre-existing issues:
- [High] Lockless access to `struct drm_ras_node` pointers retrieved from 
xarray leads to Use-After-Free.
--

commit aab90b1a7792dc491d0dcac6947d349046f95876
Author: Riana Tauro <[email protected]>

drm/drm_ras: Add drm_ras netlink error event

This commit defines a new netlink event 'error-event' and a multicast
group 'error-report' for the DRM RAS subsystem. It provides a function
for drivers to report device and node error information to subscribed
userspace listeners.

Cc: Jakub Kicinski <[email protected]>
Cc: Zack McKevitt <[email protected]>
Cc: Lijo Lazar <[email protected]>
Cc: Hawking Zhang <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Raag Jadav <[email protected]>

> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
> index d6eab29a1394c..77f912a4d101f 100644
> --- a/drivers/gpu/drm/drm_ras.c
> +++ b/drivers/gpu/drm/drm_ras.c

[ ... ]

> @@ -222,6 +255,60 @@ static int doit_reply_value(struct genl_info *info, u32 
> node_id,
>       return genlmsg_reply(msg, info);
>  }
>  
> +/**
> + * drm_ras_nl_error_event() - Report an error event
> + * @node: Node structure
> + * @error_id: ID of the error
> + * @error_name: Name of the error
> + * @value: Value associated with the error
> + * @flags: GFP flags for memory allocation
> + *
> + * Report an error-event to userspace using the error-report multicast group.
> + *
> + * Return: 0 on success, or negative errno on failure.
> + */
> +int drm_ras_nl_error_event(struct drm_ras_node *node, u32 error_id, const 
> char *error_name,
> +                        u32 value, gfp_t flags)
> +{
> +     struct genl_info info;
> +     struct sk_buff *msg;
> +     struct nlattr *hdr;
> +     int ret;
> +
> +     if (!error_name)
> +             return -EINVAL;
> +
> +     if (!genl_has_listeners(&drm_ras_nl_family, &init_net, 
> DRM_RAS_NLGRP_ERROR_REPORT))

[Severity: Medium]
Will this silently drop events for users subscribing from non-init network
namespaces? The drm_ras_nl_family sets .netnsok = true, but this check
is hardcoded to init_net.

> +             return 0;
> +
> +     genl_info_init_ntf(&info, &drm_ras_nl_family, DRM_RAS_CMD_ERROR_EVENT);
> +
> +     msg = genlmsg_new(NLMSG_GOODSIZE, flags);
> +     if (!msg)
> +             return -ENOMEM;
> +
> +     hdr = genlmsg_iput(msg, &info);
> +     if (!hdr) {
> +             ret = -EMSGSIZE;
> +             goto free_msg;
> +     }
> +
> +     ret = msg_put_error_event_attrs(msg, node, error_id, error_name, value);
> +     if (ret)
> +             goto cancel_msg;
> +
> +     genlmsg_end(msg, hdr);
> +     genlmsg_multicast(&drm_ras_nl_family, msg, 0, 
> DRM_RAS_NLGRP_ERROR_REPORT, flags);

[Severity: Medium]
Similarly, genlmsg_multicast() implicitly uses init_net under the hood. Does
this prevent containerized listeners in other network namespaces from receiving
the multicast events?

> +     return 0;
> +
> +cancel_msg:
> +     genlmsg_cancel(msg, hdr);
> +free_msg:
> +     nlmsg_free(msg);
> +     return ret;
> +}
> +EXPORT_SYMBOL(drm_ras_nl_error_event);

[Severity: High]
This isn't a bug introduced by this patch, but does locklessly accessing
drm_ras_node pointers retrieved from drm_ras_xa lead to a use-after-free?

In drivers/gpu/drm/drm_ras.c, get_node_error_counter() retrieves the node:

drivers/gpu/drm/drm_ras.c:get_node_error_counter() {
    ...
    node = xa_load(&drm_ras_xa, node_id);
    if (!node)
        return -ENOENT;
    ...
    return node->query_error_counter(node, error_id, name, value);
}

A similar lockless pattern is present in drm_ras_nl_list_nodes_dumpit():

drivers/gpu/drm/drm_ras.c:drm_ras_nl_list_nodes_dumpit() {
    ...
    xa_for_each_start(&drm_ras_xa, id, node, ctx->restart) {
        ...
        ret = nla_put_string(skb, DRM_RAS_A_NODE_ATTRS_DEVICE_NAME, 
node->device_name);
    ...
}

If a concurrent device teardown or hot-unplug removes and frees the node
while these functions are executing, will accessing node->query_error_counter
or node->device_name dereference a freed pointer?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to