On Sat, May 23, 2020 at 12:04:30PM -0700, Dmitry Torokhov wrote: > On Sat, May 23, 2020 at 8:48 AM Randy Dunlap <[email protected]> wrote: > > > > On 5/23/20 8:36 AM, Greg Kroah-Hartman wrote: > > > On Wed, May 13, 2020 at 06:18:40PM +0300, Heikki Krogerus wrote: > > >> In the function kobject_cleanup(), kobject_del(kobj) is > > >> called before the kobj->release(). That makes it possible to > > >> release the parent of the kobject before the kobject itself. > > >> > > >> To fix that, adding function __kboject_del() that does > > >> everything that kobject_del() does except release the parent > > >> reference. kobject_cleanup() then calls __kobject_del() > > >> instead of kobject_del(), and separately decrements the > > >> reference count of the parent kobject after kobj->release() > > >> has been called. > > >> > > >> Reported-by: Naresh Kamboju <[email protected]> > > >> Reported-by: kernel test robot <[email protected]> > > >> Fixes: 7589238a8cf3 ("Revert "software node: Simplify > > >> software_node_release() function"") > > >> Suggested-by: "Rafael J. Wysocki" <[email protected]> > > >> Signed-off-by: Heikki Krogerus <[email protected]> > > >> Reviewed-by: Rafael J. Wysocki <[email protected]> > > >> Reviewed-by: Brendan Higgins <[email protected]> > > >> Tested-by: Brendan Higgins <[email protected]> > > >> Acked-by: Randy Dunlap <[email protected]> > > >> --- > > >> lib/kobject.c | 30 ++++++++++++++++++++---------- > > >> 1 file changed, 20 insertions(+), 10 deletions(-) > > > > > > Stepping back, now that it turns out this patch causes more problems > > > than it fixes, how is everyone reproducing the original crash here? > > > > Just load lib/test_printf.ko and boom! > > > > > > > Is it just the KUNIT_DRIVER_PE_TEST that is causing the issue? > > > > > > In looking at 7589238a8cf3 ("Revert "software node: Simplify > > > software_node_release() function""), the log messages there look > > > correct. sysfs can't create a duplicate file, and so when your test is > > > written to try to create software nodes, you always have to check the > > > return value. If you run the test in parallel, or before another test > > > has had a chance to clean up, the function will fail, correctly. > > > > > > So what real-world thing is this test "failure" trying to show? > > Well, not sure about the test, but speaking more generally, should not > we postpone releasing parent's reference until we are in > kobj->release() handler? I.e. after all child state is cleared, and > all memory is freed, _then_ we unpin the parent?
That's what the patch was trying to do in a way. But I think you are right, we should _only_ be doing it at that point in time, and no other, which the patch was not doing. Let me go try that and see what happens... thanks, greg k-h

