On Tue, Apr 16, 2019 at 11:06:54AM +0000, Koenig, Christian wrote: > Am 16.04.19 um 12:54 schrieb Karol Herbst: > > On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian > > <christian.koe...@amd.com> wrote: > >> Am 16.04.19 um 11:10 schrieb Karol Herbst: > >>> On Tue, Apr 16, 2019 at 8:38 AM Christian König > >>> <ckoenig.leichtzumer...@gmail.com> wrote: > >>>> Am 16.04.19 um 02:35 schrieb Karol Herbst: > >>>>> Kobjects are supposed to be dynamically allocated, but with recent > >>>>> changes > >>>>> this rule was violated. Reverting those commits fixes crashes when a drm > >>>>> driver using TTM gets loaded again. > >>>>> > >>>>> The object in question is "ttm_mem_glob" declared inside > >>>>> "include/drm/ttm/ttm_memory.h" and instatiated inside > >>>>> "drivers/gpu/drm/ttm/ttm_memory.c". > >>>>> > >>>>> from "Documentation/kobject.txt": > >>>>> "Because kobjects are dynamic, they must not be declared statically or > >>>>> on > >>>>> the stack, but instead, always allocated dynamically. Future versions > >>>>> of > >>>>> the kernel will contain a run-time check for kobjects that are created > >>>>> statically and will warn the developer of this improper usage." > >>>>> > >>>>> Unloading ttm before reloading the driver workarounds that crash, > >>>>> because > >>>>> the memory backing the kobject member "kobj" is cleaned up. The > >>>>> kobject_del > >>>>> and kobject_put function never free or clean up the kobject object > >>>>> leaving > >>>>> it in an undefined state. > >>>>> > >>>>> I reverted a few more commits to make it less painful for me to rever > >>>>> this > >>>>> rather big change. > >>>> Well, NAK. By reverting those change you also re-introduced the problems > >>>> we originally fixed with those patches. > >>>> > >>>> Please work on a proper fix instead, > >>>> Christian. > >>> And which problem was that besides duplicated code (or maybe even a > >>> bit of memory consumption if multiple ttm driver were used)? If I had > >>> to choose between duplicated code and a crash, I choose the former. > >>> > >>> Maybe I missed the real reason why those changes are made, but the > >>> commit messages don't really seem to tell me. > >> The old implementation crashed because different drivers tried to > >> allocate the same kobj. > >> > >> Crashing in one way is not better than crashing in another way. > >> > >> Christian. > >> > > how can that old crash be triggered? By loading two ttm based drivers > > at the same time or by other means? > > Yes, exactly. Even worse it could be triggered by one driver > instantiating multiple times at the same time, e.g two AMD GPUs in the > same system. > > On the other hand I completely agree that using kobj static is > completely nuts. The problem is that using a kobj was the wrong approach > in the first place. > > In other words when you have something which controls a global behavior > of a module, what do you use? Right, a module parameter. > > Point is that we can't get away from those kobj without breaking the > uapi, so that is something which can't be done easily.
Randome idea: Push the kobj setup into drm.ko (and shrugg it off as another lesson in how maybe uapi shouldn't have been designed, but hey not our worst mistake by far). I think that'd be totally ok. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel