Hello Marco, Liu, On Thu Mar 12, 2026 at 10:15 AM CET, Liu Ying wrote: > On Wed, Mar 11, 2026 at 08:45:25PM +0100, Marco Felsch wrote: >> Hi Liu, Luca, > > Hi, > >> >> sorry for the delayed response, I was at the EW26. > > Np at all, it's good to go out to meet people sometimes :) > >> >> On 26-03-10, Luca Ceresoli wrote: >>> Hi Liu, Marco, >>> >>> On Tue Mar 10, 2026 at 3:57 AM CET, Liu Ying wrote: >>>> Hi Marco, Luca, >>>> >>>> On Tue, Mar 03, 2026 at 11:34:27AM +0100, Marco Felsch wrote: >>>> >>>> [...] >>>> >>>>> + next_bridge = devm_drm_of_get_bridge(dev, dev->of_node, 1, 0); >>>>> + if (IS_ERR(next_bridge)) >>>>> + return dev_err_probe(dev, PTR_ERR(next_bridge), >>>>> + "failed to get next bridge\n"); >>>>> + pdfc->dev = dev; >>>>> + pdfc->bridge.of_node = dev->of_node; >>>>> + pdfc->bridge.type = DRM_MODE_CONNECTOR_DPI; >>>>> + pdfc->bridge.next_bridge = next_bridge; >>>> >>>> When I was reviewing another patch[1], I was aware of the necessity of >>>> calling drm_bridge_get() for next_bridge to balance the next bridge's >>>> refcount put from __drm_bridge_free() for this bridge. I'd be good if >>>> Luca may confirm this is correct. Sorry for bringing this up late. >>> >>> Indeed you have a good point. >> >> At which stage did you faced this issue? During driver probe, because of >> EPROBE_DEFER? > > I just want to balance the get/put for the next bridge's refcount by > calling drm_bridge_get() for next_bridge, as I said above. There could > be some way to trigger some particular Use-After-Free issues if you don't > do that. I did take some time today to try to trigger UAF, but no luck, > though I found a potential bug in encoder_bridges_show() and generated a > fix[1](Cc'ed Marco) when I played with debugfs to read the refcounts.
The whole need for refcounting DRM bridges exists if such bridges can be removed when other parts of the kernel still have a pointer (= a reference) to them. This is not the case with current DRM because you cannot remove some bridges without removing the whole pipeline (card). However I am working since a long time to suppor hardware where the last part of the display pipeline is hot-pluggable, causing DRM bridges to be added and removed at runtime without removal of the initial part of the pipeline (CRCT, encoder and 0 or more bridges). See these links for more background info: * ELCE 2025: - https://www.youtube.com/watch?v=C8dEQ4OzMnc - https://bootlin.com/pub/conferences/2025/elce/ceresoli-hotplug-status.pdf * Lots of links in the slides about previous activity * LPC 2024 has a general description fo the use case: - https://lpc.events/event/18/contributions/1696/ > My idea to trigger UAF is to remove imx_lcdif/imx93_pdfc/simple_panel > modules when doing pageflips. Dunno if EPROBE_DEFER may trigger UAF. Perhaps there is some possible UAF on probe errors or deferrals, but the typical case is when removing the final part of a pipeline, including 1+ bridges. This was discussed in detail in [0], but here's an excerpt with the typical UAF scenario with hotplug: 1. pipeline: encoder --> bridge A --> bridge B --> bridge C 2. encoder takes a reference to bridge B using devm_drm_of_find_bridge() or other means 3. bridge B takes a next_bridge reference to bridge C using devm_drm_of_find_bridge() 4. encoder calls (bridge B)->foo(), which in turns references next_bridge, e.g.: b_foo() { bar(b->next_bridge); } If bridges B and C are removed, bridge C can be freed but B is still allocated because the encoder holds a ref. So when step 4 happens, 'b->c' would be a use-after-free (or NULL deref if b.remove cleared it, which is just as bad). [0] https://lore.kernel.org/all/[email protected]/ >> That's the reason for having the local next_bridge variable since I >> faced with the same issue. In other words this driver is correct and >> it's on purpose to not assign it directly. Albeit I could/should have >> added a comment. > > What's the issue your faced? How would using next_bridge variable impact? > Without knowing these info, I presume that you still need to call > drm_bridge_get() for next_bridge, since it's kind of obvious if you take > a look at __drm_bridge_free(). Exactly. The idea in a nutshell is: every pointer to a drm_bridge stored somewhere is a reference to a bridge. So: * When you take a reference (= you set a pointer to the address of a bridge) you need to call drm_bridge_get() to increment the refcount. * When you clear a reference (set it to NULL) or the pointer becomes unreachable (the device goes away, the struct holding it is not reachable anymore, etc) you need to call drm_bridge_put() to decrement the refcount. This way the struct drm_bridge (or, typically, the private driver struct embedding it) will be freed only when the refcount goes to 0, avoiding UAF. I have been converting most of the accessors returning a drm_bridge pointer in order to drm_bridge_get() the bridge before returning it. A few random examples: - https://lore.kernel.org/lkml/20250620-drm-bridge-alloc-getput-drm-bridge-c-v9-0-ca53372c9...@bootlin.com/ - https://lore.kernel.org/lkml/20250709-drm-bridge-alloc-getput-drm_bridge_get_prev_bridge-v1-0-34ba6f395...@bootlin.com/ - https://lore.kernel.org/lkml/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-2-v2-4-8bad3ef90...@bootlin.com/ I hope this clarifies the general refcount topic. Now to *_of_get_bridge(). >>> After re-checking devm_drm_of_get_bridge(), as I wrote on the other thread >>> you pointed to, you should call drm_bridge_get(): >>> >>> - pdfc->bridge.next_bridge = next_bridge; >>> + pdfc->bridge.next_bridge = drm_bridge_get(next_bridge); >>> >>> Marco, you can keep my R-by if you resend with just this change. >>> >>> Sorry about the confusion here. >>> >>> As mention on the other thread, devm_drm_of_get_bridge() is unable to >>> support bridge hotplug. So it should be deprecated, but as of now there is >>> no alternative. >> >> Sorry I need a bit more context. What's the issue? The issue with devm_drm_of_get_bridge(MYDEV, ...) (similarly for other *_of_get_bridge() functions) is it can return one of: A) an existing bridge, returned by drm_of_find_panel_or_bridge() in the &bridge parameter B) a panel_bridge it creates on the fly calling devm_drm_panel_bridge_add(MYDEV) based on the panel returned by drm_of_find_panel_or_bridge() in the &panel parameter The caller of devm_drm_of_get_bridge() gets a drm_bridge pointer but doesn't know whether case A or B happened. This is not a problem if the entire card is torn down at once because devm/drmm will take care or the removal. Even easier if the card is not torn down at all. Now imagine adding hotplug so you can keep MYDEV present but remove any bridges after MYDEV and the panel. The devm action added by devm_drm_of_get_bridge() won't trigger (because MYDEV is not going away). The caller of devm_drm_of_get_bridge(), i.e. the MYDEV driver, should remove the panel_bridge in case B but it has no way to know whether case A or B happened. So, this is tricky. The good news for you is you're not supposed to fix it. The *_of_get_bridge() API is just unable to handle hotplug, that will be fixed at some point. For now just ensure you get a reference for every bridge pointer you store, and to put it when that reference goes away. >> How can I trigger the >> issue? See the example above. >> Why is bridge hotplug required at this stage? Because there is hot-pluggable hardware we want to support, see the ELCE and even better the LPC links above. >> Why is only this >> bridge affecte by the hotplug issue? It's not only this driver. Hotplug can potentially be used on any hardware, so I'm working to implement it in a general way in the DRM common code, but all drviers need to at least do the refcounting on their side. I'm trying to catch all patches involving bridge pointers and check whether they do refcounting right. This is just one of many patches I have commented about in the past months. I hope this clarified a bit. Best regards, Luca -- Luca Ceresoli, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
