Hi Philipp,

On 13.09.2018 01:36, Philipp Zabel wrote:
> Hi Stefan,
> 
> thank you for the report. I think what happens is the following:

Thanks for looking into it!

> 
> On Wed, 2018-09-12 at 15:26 -0700, Stefan Agner wrote:
>> Hi,
>>
>> While working on Apalis iMX6 with four displays, I encountered the
>> following crash:
>>
> [...]
>> [    3.781163] imx-drm display-subsystem: bound 120000.hdmi (ops 
>> dw_hdmi_imx_ops)
>> [    3.788441] imx-drm display-subsystem: binding ldb (ops imx_ldb_ops)
>> [    3.794902] imx_ldb_bind, imx_ldb ec0da010
> 
> component_bind calls devres_open_group right before component->ops->bind
> 
> The devmem_kzalloc'ed imx_ldb_channel that contains the encoder is
> allocated in this devres group.
> 
>> [    3.799818] drm_encoder_init, encoder ec0da388
>> [    3.804363] drm_encoder_init, ret 0
> 
> drm_encoder_init adds the encoder in imx_ldb to the
> mode_config.encoder_list.
> 
>> [    3.807908] imx_ldb_register, encoder ec0da388
>> [    3.812432] imx_ldb_register, ret 0
>> [    3.815951] imx_ldb_bind encoder LVDS-46, ec0da388 ->func c0e6371c
>> [    3.822227] imx_ldb_bind, done
>> [    3.825331] imx-drm display-subsystem: bound ldb (ops imx_ldb_ops)
>> [    3.831614] imx-drm display-subsystem: binding 
>> parallel-display-controller1 (ops imx_pd_ops)
>> [    3.840285] drm_encoder_init, encoder ec8a2b78
>> [    3.844931] imx-drm display-subsystem: Linked as a consumer to panel
>> [    3.851472] imx-drm display-subsystem: bound parallel-display-controller1 
>> (ops imx_pd_ops)
>> [    3.859785] imx-drm display-subsystem: binding 
>> parallel-display-controller0 (ops imx_pd_ops)
>> [    3.868561] imx-drm display-subsystem: failed to bind 
>> parallel-display-controller0 (ops imx_pd_ops): -517
>> [    3.878225] imx-drm display-subsystem: Dropping the link to panel
>> [    3.884679] imx_ldb_unbind
>> [    3.887421] imx_ldb_unbind, encoder LVDS-46, ec0da388 ->func c0e6371c
>> [    3.893950] imx_ldb_unbind, encoder (null), ec0da838 ->func 0
>> [    3.899723] imx_ldb_unbind, done
> 
> component_unbind calls devres_release_group after component->ops-
>>unbind, freeing imx_ldb and with it the encoder that is still in the
> mode_config.encoder_list

Oh I see, I did not realize that component_unbind releases devres...

> 
>> [    3.908345] imx_drm_bind, component_bind_all, ret -517
>> [    3.913638] drm_mode_config_cleanup, encoder TMDS-44, ec861894 ->funcs 
>> c0e63a18
>> [    3.921260] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [    3.927897] drm_encoder_cleanup, encoder ec861894
>> [    3.932717] drm_mode_config_cleanup, encoder (null), ec0da388 ->funcs 0
> 
> Use after free.
> 

That all makes sense now...


>> [    3.939356] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [    3.946050] Unable to handle kernel NULL pointer dereference at virtual 
>> address 00000004
> [...]
>> The Apalis iMX6 device tree I work with uses a dumb VGA bridge and a LVDS
>> display directly specified in the ldb node. In the case above I did not
>> compile in the dumb VGA bridge driver, that is the reason why binding
>> parallel-display-controller0 fails.
> 
> I suppose this means that component drivers must not allocate the
> encoder in the component devres group during bind. Not only their own
> failure, but any other component's failure can cause component_unbind to
> be called in the cleanup path of component_bind_all, freeing the memory
> before drm_mode_config_cleanup is called.
> 

Indeed, moving allocation into driver bind fixes the crash!

I wonder, how does devres knows that imx_ldb got allocated in probe and does 
not free on unbind? I guess that is the group mechanism which makes sure of 
that?

ldb would not the only driver affected, also 
drivers/gpu/drm/imx/parallel-display.c allocates encoders in component bind, 
there are probably more.

Actually, when thinking more about it, if we would have a way to call framework 
cleanup functions (drm_mode_config_cleanup() in this case) between the bind and 
unbind loops in component_bind_all(), we would not have that problem.

[adding Russel, since he introduced the component infrastructure]

>> The LVDS encoder (0xec1a0388 in that case) is somehow already cleaned up
>> when calling drm_mode_config_cleanup(), which leads to a null pointer
>> deference when trying to call destroy.
>>
>> I was unable to figure out why the DRM encoder in the second
>> drm_mode_config_cleanup() call is already zeroed out. The component
>> framework calls imx_ldb_unbind() first, but that should not free
>> up the encoder afaict. Any idea?
>>
>> I was also wondering in the deferred probing case, functions like
>> imx_ldb_bind() call devm_kzalloc on every try. Since the underlying
>> device is not removed, doesn't this leads to multiple active allocations?
> 
> See above. To me it looks like we have to allocate imx_ldb in probe and
> deal with possibly partially preinitialzied structure when bind is
> called a second time.
> 

Yeah with the fact that component framework does free devres at unbind time, my 
concern stated here is null...

--
Stefan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to