On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov <anatol.pomo...@gmail.com> wrote: > > Here is timeline for the crash in case if kset_find_obj() searches for > an object tht nobody holds and other thread is doing kobject_put() > on the same kobject: > > THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put()) > splin_lock() > atomic_dec_return(kobj->kref), counter > gets zero here > ... starts kobject cleanup .... > spin_lock() // WAIT thread A in > kobj_kset_leave() > iterate over kset->list > atomic_inc(kobj->kref) (counter becomes 1) > spin_unlock() > spin_lock() // taken > // it does not know that thread A > increased counter so it > remove obj from list > spin_unlock() > vfree(module) // frees module object > with containing kobj > > // kobj points to freed memory area!! > koubject_put(kobj) // OOPS!!!!
This is a much more generic bug in kobjects, and I would hate to add some random workaround for just one case of this bug like you do. The more fundamental bug needs to be fixed too. I think the more fundamental bugfix is to just fix kobject_get() to return NULL if the refcount was zero, because in that case the kobject no longer really exists. So instead of having kref_get(&kobj->kref); it should do if (!atomic_inc_not_zero(&kobj->kref.refcount)) kobj = NULL; and I think that should fix your race automatically, no? Proper patch attached (but TOTALLY UNTESTED - it seems to compile, though). The problem is that we lose the warning for when the refcount is zero and somebody does a kobject_get(), but that is ok *assuming* that people actually check the return value of kobject_get() rather than just "know" that if they passed in a non-NULL kobj, they'll get it right back. Greg - please take a look... I'm adding Al to the discussion too, because Al just *loooves* these kinds of races ;) Linus
patch.diff
Description: Binary data