Hi Maxime,

On Thu Dec 11, 2025 at 6:47 PM CET, Luca Ceresoli wrote:
> Hi Maxime,
>
> On Mon Dec 1, 2025 at 5:51 PM CET, Maxime Ripard wrote:
>> On Mon, Nov 24, 2025 at 05:25:39PM +0100, Luca Ceresoli wrote:
>>> Hi Maxime,
>>>
>>> On Mon Nov 24, 2025 at 11:39 AM CET, Maxime Ripard wrote:
>>> > On Wed, Nov 19, 2025 at 02:05:37PM +0100, Luca Ceresoli wrote:
>>> >> Several drivers (about 20) follow the same pattern:
>>> >>
>>> >>  1. get a pointer to a bridge (typically the next bridge in the chain) by
>>> >>     calling of_drm_find_bridge()
>>> >>  2. store the returned pointer in the private driver data, keep it until
>>> >>     driver .remove
>>> >>  3. dereference the pointer at attach time and possibly at other times
>>> >>
>>> >> of_drm_find_bridge() is now deprecated because it does not increment the
>>> >> refcount and should be replaced with drm_of_find_bridge() +
>>> >> drm_bridge_put().
>>> >>
>>> >> However some of those drivers have a complex code flow and adding a
>>> >> drm_bridge_put() call in all the appropriate locations is error-prone,
>>> >> leads to ugly and more complex code, and can lead to errors over time 
>>> >> with
>>> >> code flow changes.
>>> >>
>>> >> To handle all those drivers in a straightforward way, add a devm variant 
>>> >> of
>>> >> drm_of_find_bridge() that adds a devm action to invoke drm_bridge_put()
>>> >> when the said driver is removed. This allows all those drivers to put the
>>> >> reference automatically and safely with a one line change:
>>> >>
>>> >>   - priv->next_bridge = of_drm_find_bridge(remote_np);
>>> >>   + priv->next_bridge = devm_drm_of_find_bridge(dev, remote_np);
>>> >>
>>> >> Signed-off-by: Luca Ceresoli <[email protected]>
>>> >>
>>> >> ---
>>> >>  drivers/gpu/drm/drm_bridge.c | 30 ++++++++++++++++++++++++++++++
>>> >>  include/drm/drm_bridge.h     |  5 +++++
>>> >>  2 files changed, 35 insertions(+)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
>>> >> index 09ad825f9cb8..c7baafbe5695 100644
>>> >> --- a/drivers/gpu/drm/drm_bridge.c
>>> >> +++ b/drivers/gpu/drm/drm_bridge.c
>>> >> @@ -1446,6 +1446,36 @@ struct drm_bridge *drm_of_find_bridge(struct 
>>> >> device_node *np)
>>> >>  }
>>> >>  EXPORT_SYMBOL(drm_of_find_bridge);
>>> >>
>>> >> +/**
>>> >> + * devm_drm_of_find_bridge - find the bridge corresponding to the device
>>> >> + *                           node in the global bridge list and add a 
>>> >> devm
>>> >> + *                           action to put it
>>> >> + *
>>> >> + * @dev: device requesting the bridge
>>> >> + * @np: device node
>>> >> + *
>>> >> + * On success the returned bridge refcount is incremented, and a devm
>>> >> + * action is added to call drm_bridge_put() when @dev is removed. So the
>>> >> + * caller does not have to put the returned bridge explicitly.
>>> >> + *
>>> >> + * RETURNS:
>>> >> + * drm_bridge control struct on success, NULL on failure
>>> >> + */
>>> >> +struct drm_bridge *devm_drm_of_find_bridge(struct device *dev, struct 
>>> >> device_node *np)
>>> >> +{
>>> >> +        struct drm_bridge *bridge = drm_of_find_bridge(np);
>>> >> +
>>> >> +        if (bridge) {
>>> >> +                int err = devm_add_action_or_reset(dev, 
>>> >> drm_bridge_put_void, bridge);
>>> >> +
>>> >> +                if (err)
>>> >> +                        return ERR_PTR(err);
>>> >> +        }
>>> >> +
>>> >> +        return bridge;
>>> >> +}
>>> >> +EXPORT_SYMBOL(devm_drm_of_find_bridge);
>>> >
>>> > That's inherently unsafe though, because even if the bridge is removed
>>> > other parts of DRM might still have a reference to it and could call
>>> > into it.
>>> >
>>> > We'd then have dropped our reference to the next bridge, which could
>>> > have been freed, and it's a use-after-free.
>>>
>>> I think you refer to this scenario:
>>>
>>>   1. pipeline: encoder --> bridge A --> bridge B --> bridge C
>>>   2. encoder takes a reference to bridge B
>>>      using devm_drm_of_find_bridge() or other means
>>>   3. bridge B takes a next_bridge reference to bridge C
>>>      using devm_drm_of_find_bridge()
>>>   4. encoder calls (bridge B)->foo(), which in turns references
>>>      next_bridge, e.g.:
>>>
>>>        b_foo() {
>>>            bar(b->next_bridge);
>>>        }
>>>
>>> If bridges B and C are removed, bridge C can be freed but B is still
>>> allocated because the encoder holds a ref. So when step 4 happens, 'b->c'
>>> would be a use-after-free (or NULL deref if b.remove cleared it, which is
>>> just as bad).
>>
>> Yep.
>>
>>> If I got you correctly, then I'm a bit surprised by your comment. This
>>> series is part of the first chapter of the hotplug work, which does not aim
>>> at fixing everything but rather at fixing one part: handle dynamic
>>> _allocation_ lifetime of drm_bridges by adding a refcount and
>>> drm_bridge_get/put().
>>>
>>> Chapter 2 of the work is adding drm_bridge_enter/exit/unplug() [1] and
>>> other changes in order to avoid code of drivers of removed bridges to
>>> access fields they shouldn't. So the above example at point 4 would become:
>>>
>>>        b_foo() {
>>>            if (!drm_bridge_enter())
>>>                return;
>>>            bar(b->c);
>>>            drm_bridge_exit();
>>>        }
>>>
>>> And that avoids 'b->c' after bridge B is removed.
>>>
>>> Does that answer your remark?
>>
>> Not really. I wasn't really questionning your current focus, or the way
>> you laid out the current agenda or whatever.
>>
>> What I am questionning though is whether or not we want to introduce
>> something we will have to untangle soon, and even more so when we're not
>> mentioning it anywhere.
>>
>>> > It's more complicated than it sounds, because we only have access to the
>>> > drm_device when the bridge is attached, so later than probe.
>>> >
>>> > I wonder if we shouldn't tie the lifetime of that reference to the
>>> > lifetime of the bridge itself, and we would give up the next_bridge
>>> > reference only when we're destroyed ourselves.
>>>
>>> I'm afraid I'm not following you, sorry. Do you refer to the time between
>>> the bridge removal (driver .remove) and the last bridge put (when
>>> deallocation happens)?
>>>
>>> In that time frame the struct drm_bridge is still allocated along with any
>>> next_bridge pointer it may contain, but the following bridge could have
>>> been deallocated.
>>>
>>> What do you mean by "give up the next_bridge"?
>>
>> What I was trying to say was that if we want to fix the problem you
>> illustrated about, we need to give up the reference at __drm_bridge_free
>> time. So each bridge having a reference to a bridge would need to do so
>> in its destroy hook.
>>
>> Since it's quite a common pattern, it would make sense to add a
>> next_bridge field to drm_bridge itself, so the core can do it
>> automatically in __drm_bridge_free if that pointer is !NULL.
>>
>> But...
>>
>>> > Storing a list of all the references we need to drop is going to be
>>> > intrusive though, so maybe the easiest way to do it would be to create a
>>> > next_bridge field in drm_bridge, and only drop the reference stored
>>> > there?
>>> >
>>> > And possibly tie the whole thing together using a helper?
>>> >
>>> > Anyway, I'm not sure it should be a prerequisite to this series. I we do
>>> > want to go the devm_drm_of_find_bridge route however, we should at least
>>> > document that it's unsafe, and add a TODO entry to clean up the mess
>>> > later on.
>>
>> ... I *really* don't consider it something you need to work on right now.
>>
>>> Do you mean the drm variant is unsafe while the original
>>> (drm_of_find_bridge() in this series, might be renamed) is not? I
>>> don't see how that can happen. If the driver for bridge B were to use
>>> drm_of_find_bridge(), that driver would be responsible to
>>> drm_bridge_put(b->next_bridge) in its .remove() function or earlier.
>>> So the next_bridge pointing to bridge C would equally become subject
>>> to use-after-free.
>>
>> No, I was saying that both are equally unsafe. But we're adding a new,
>> broken, helper, and we don't mention anywhere that it is. So what I was
>> saying is mostly do we really want to introduce some more broken code
>> when we know it is. And if we do, we should be really clear about it.
>>
>>> devm does not make it worse, on the opposite it postpones the
>>> drm_bridge_put(next_bridge) as late as possible: just after
>>> b.remove().
>>
>> Which doesn't really change anything, does it? I'd expect the window
>> between the remove and final drm_bridge_put to be much wider than the
>> execution time of remove itself.
>>
>>> One final, high-level thought about the various 'next_bridge' pointers that
>>> many bridge drivers have. Most of them do:
>>>
>>>  0. have a 'struct drm_bridge next_bridge *' in their private struct
>>>  1. take the next_bridge reference during probe or another startup phase
>>>  2. store it in their private driver struct
>>>  3. use it to call drm_bridge_attach
>>>  4. (pending) put the reference to it in their .remove or earlier
>>>
>>> I'm wondering whether we could let the DRM bridge core do it all, by
>>> removing items 0, 1, 2 and 4, and change 3 as:
>>>
>>> -     drm_bridge_attach(encoder, me->next_bridge, &me->bridge, flags);
>>> +  drm_of_bridge_attach(encoder, &me->bridge, dev->of_node, 1, -1, flags);
>>>
>>> where dev->of_node and the following integers are the same flags passed to
>>> devm_drm_of_get_bridge() and the like, i.e. the endpoint info needed to
>>> walk the DT graph and reach the next bridge.
>>>
>>> This would allow the core to take care of all locking and lifetime of the
>>> next bridge, and most (all?) bridges would never access any pointers to the
>>> next bridge. The idea is to let the core do the right thing in a single
>>> place instead of trying to make all drivers do the right thing (and
>>> touching dozen files when needing to touch the logic).
>>>
>>> That is more a long-term ideal than something I'd do right now, but having
>>> opinions would be very interesting.
>>
>> That was pretty much my point, yeah.
>>
>> Maxime
>
> Let me recap this discussion, because there are various aspects and I need
> to clarify by view on it.
>
> First: the problem you discuss is about drm_of_find_bridge() introduced in
> patch 1. The devm variant is just equally affected.
>
> You proposed adding a next_bridge field in struct drm_bridge so there is an
> automated, common call to drm_bridge_put() (and setting it to NULL). It
> would remove some burden on individual drivers of course, but I don't think
> it would solve the problem. In the same scenario we are discussing
> (i.e. encoder --> bridge A --> bridge B --> bridge C, then B+C get removed)
> B's next_bridge would be automatically put, but the encoder could still
> call B->foo(), which could still do B->next_bridge.

Ah, I realied I'm wrong here. Your proposal is to put the reference at
__drm_bridge_free time, not a release time. So yes, it would work. At least
for the simple cases, where there's only the next_bridge pointer stored.

> Additionally, as a matter of fact there are currently drivers storing
> bridge pointers. The next_bridge is the most common case. Code using
> drm_bridge_connector_init() for example can store up to eight of them, but
> individual drivers are the hardest to hunt for.
>
> I can see these (potential) tools to handle this (not mutually exclusive):
>
>  1. remove drm_bridge pointers pointing to other bridges
>  2. check whether a bridge (say B) still exists before any dereference
>     to B->another_bridge: that's drm_bridge_enter/exit()
>  3. let owners of bridge pointers be notified when a bridge is unplugged,
>     so they can actively put their reference and clear their pointer
>
> For item 1, I think the drm_of_bridge_attach() idea quoted above would
> work, at least for the simple cases where bridge drivers use the
> next_bridge only for attach. A next_bridge pointer in struct drm_bridge is
> not even needed in that case, the pointer would be computed from OF when
> needed and not stored. I can do an experiment and send a first series, do
> you think it would be useful?

I had a look and, while the implementation should be simple, only a few
drivers could benefit right now. The majority fall into one of these
categories:

 * drivers using drm_of_find_panel_or_bridge() or *_of_get_bridge()
   (maybe 60-80% of all drivers, those will have to wait for the panel
   improvements)
 * drivers using the next_bridge pointer for more than just attach
 * drivers doing more complicated stuff

I think your "put next_bridge in __drm_bridge_free" idea would fit well the
2nd category and perhaps also the 1st one. For the 3rd category we'd need
something different, e.g. a per-driver .destroy callback.

So, while your idea would work, it would avoid use-after-free but not
prevent calls into a bridge code after the bridge is removed, which is, in
the best case, useless. I still think we should aim at avoiding the
dereferences to even happen, so my 3 ideas above still look to me important
to evaluate.

> For item 2, I still think it would just work for cases not (or not yet)
> covered by item 1. Given the same example as above:
>
>         b_foo() {
>             if (!drm_bridge_enter())
>                 return;
>             bar(b->c);
>             drm_bridge_exit();
>         }
>
> drm_bridge_enter() will prevent doing anything on b->whatever, including
> any stale pointers. The idea is "I, bridge B, have been unplugged, my
> resources are not reliably usable, keep out!". No?
>
> However item 2 won't prevent the caller from keeping the pointer for a long
> time, especially if b_foo() is a void function so it cannot return anything
> saying "I was unplugged, I'm no longer theree". And that's where item 3
> (notifiers) might come in useful: upon receiving a bridge removal
> notification any driver or other code is in a position to actively put its
> reference to the bridge and clear its pointer. I had proposed something in
> [0], which can likely be improved, but it is useful in my current draft
> solution.
>
> I'd like to know your opinions about the above points, so we can
> progressively clarify the path forward.
>
> PS: I just realized about a fourth option: a revocable [1] (thanks Laurent
> who pointed me to it). That's basically a wrapper around a pointer that
> allows to guard from accessing it when the pointed object is gone. To me it
> looks somewhat equivalent to drm_bridge_enter/exit() but with very
> different APIs. I still haven't looked at the details.
>
> [0] 
> https://lore.kernel.org/lkml/[email protected]/
> [1] 
> https://lore.kernel.org/lkml/[email protected]/#t

Luca


--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Reply via email to