On Wed, 27 Mar 2024 13:37:19 -0700, Dixit, Ashutosh wrote:
>
> On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
> >
>
> Hi Janusz,
>
> > For me, that still doesn't explain why you think that i915->hwmon reset to
> > NULL on i915 driver unregister can be the root cause of the reported UAF in
> > hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> > references to i915->hwmon in that code, and even then, that's not NULL what 
> > is
> > reported here as non-canonical address.  The code is using a no longer valid
> > pointer to an already freed (and poisoned) memory.
>
> Correct, I said basically the same thing in my reply to the patch. That the
> patch doesn't explain that ddat seems to have been freed and poisoned.
>
> > > > I think that instead of dropping i915_hwmon_unregister() we have to 
> > > > actually
> > > > unregister hwmon in the body of that function, which is called from
> > > > i915_driver_unregister() intended for closing user interfaces, then 
> > > > called
> > > > relatively early during driver unbind, so hwmon sysfs entries disappear 
> > > > before
> > > > i915 structures, especially uncore used by hwmon code, are freed.
> > >
> > > You mean uncore are freed before hwmon get unregistered, I don't think
> > > so. uncore is also drm device managed resource so in sequence I think it
> > > should be freed after i915 dev managed resources freed.
> >
> > If both uncore and hwmon are managed resources of i915 device then how can 
> > you
> > predict which of them is released first?
>
> Look at __hwmon_device_register. Here we see:
>
>       hdev->parent = dev
>
> So the hwmon device is a child device of the drm device (against which ddat
> is devm_kzalloc'd). Since a child device holds a reference against the
> parent (device_add() has get_device(dev->parent)), I would expect the hwmon
> device to disappear before the parent drm device. And I am assuming hwmon
> sysfs is linked to the hwmon device, so sysfs should disappear before ddat
> getting freed. But apparently this is not what is happening, so there's
> still something we are missing.

I looked into this a bit more. It doesn't seem to have to do with device
references. Basically, managed resources are maintained in a list and they
are removed in a reverse order from which they are added. See
release_nodes(). I believe (haven't really checked it but it looks to be
the case from the logs below) the call stack during release is:

device_release_driver ->
 __device_release_driver ->
  device_unbind_cleanup ->
   devres_release_all ->
    release_nodes

With CONFIG_DEBUG_DEVRES=y and 'echo 1 > /sys/module/devres/parameters/log'
and running the selftests I always see:

Alloc:

[  878.406556] i915 0000:03:00.0: DEVRES ADD ffff88815d8d4000 
devm_kzalloc_release (616 bytes)
[  878.407632] i915 0000:03:00.0: DEVRES ADD ffff888199e6dd40 
devm_hwmon_release (8 bytes)

Dealloc:

[  879.317635] i915 0000:03:00.0: DEVRES REL ffff888199e6dd40 
devm_hwmon_release (8 bytes)
[  879.318261] i915 0000:03:00.0: DEVRES REL ffff88815d8d4000 
devm_kzalloc_release (616 bytes)

So from the above the memory is always freed after hwmon is
unregistered. 616 bytes is 'sizeof(struct i915_hwmon)'.

So the code seems to be correct. Unless something else is going on these CI
systems where we see this crash.

If we can repro the crash locally (as I described in my other email) that
would be the best. Then we can verify any fix. If we cannot repro it
locally, then we'll need to think of something else. Basically the
direction would be to see if we eliminate devm_ and if that makes the issue
go away.

Thanks.
--
Ashutosh

Reply via email to