On Tue, Apr 16, 2019 at 11:06:54AM +0000, Koenig, Christian wrote:
> Am 16.04.19 um 12:54 schrieb Karol Herbst:
> > On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
> > <christian.koe...@amd.com> wrote:
> >> Am 16.04.19 um 11:10 schrieb Karol Herbst:
> >>> On Tue, Apr 16, 2019 at 8:38 AM Christian König
> >>> <ckoenig.leichtzumer...@gmail.com> wrote:
> >>>> Am 16.04.19 um 02:35 schrieb Karol Herbst:
> >>>>> Kobjects are supposed to be dynamically allocated, but with recent 
> >>>>> changes
> >>>>> this rule was violated. Reverting those commits fixes crashes when a drm
> >>>>> driver using TTM gets loaded again.
> >>>>>
> >>>>> The object in question is "ttm_mem_glob" declared inside
> >>>>> "include/drm/ttm/ttm_memory.h" and instatiated inside
> >>>>> "drivers/gpu/drm/ttm/ttm_memory.c".
> >>>>>
> >>>>> from "Documentation/kobject.txt":
> >>>>> "Because kobjects are dynamic, they must not be declared statically or 
> >>>>> on
> >>>>> the stack, but instead, always allocated dynamically.  Future versions 
> >>>>> of
> >>>>> the kernel will contain a run-time check for kobjects that are created
> >>>>> statically and will warn the developer of this improper usage."
> >>>>>
> >>>>> Unloading ttm before reloading the driver workarounds that crash, 
> >>>>> because
> >>>>> the memory backing the kobject member "kobj" is cleaned up. The 
> >>>>> kobject_del
> >>>>> and kobject_put function never free or clean up the kobject object 
> >>>>> leaving
> >>>>> it in an undefined state.
> >>>>>
> >>>>> I reverted a few more commits to make it less painful for me to rever 
> >>>>> this
> >>>>> rather big change.
> >>>> Well, NAK. By reverting those change you also re-introduced the problems
> >>>> we originally fixed with those patches.
> >>>>
> >>>> Please work on a proper fix instead,
> >>>> Christian.
> >>> And which problem was that besides duplicated code (or maybe even a
> >>> bit of memory consumption if multiple ttm driver were used)? If I had
> >>> to choose between duplicated code and a crash, I choose the former.
> >>>
> >>> Maybe I missed the real reason why those changes are made, but the
> >>> commit messages don't really seem to tell me.
> >> The old implementation crashed because different drivers tried to
> >> allocate the same kobj.
> >>
> >> Crashing in one way is not better than crashing in another way.
> >>
> >> Christian.
> >>
> > how can that old crash be triggered? By loading two ttm based drivers
> > at the same time or by other means?
> 
> Yes, exactly. Even worse it could be triggered by one driver 
> instantiating multiple times at the same time, e.g two AMD GPUs in the 
> same system.
> 
> On the other hand I completely agree that using kobj static is 
> completely nuts. The problem is that using a kobj was the wrong approach 
> in the first place.
> 
> In other words when you have something which controls a global behavior 
> of a module, what do you use? Right, a module parameter.
> 
> Point is that we can't get away from those kobj without breaking the 
> uapi, so that is something which can't be done easily.

Randome idea: Push the kobj setup into drm.ko (and shrugg it off as
another lesson in how maybe uapi shouldn't have been designed, but hey not
our worst mistake by far). I think that'd be totally ok.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to