2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <h...@goldelico.com>: > Hi, > we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer > problem within the bq27x00 driver. > > It appears to be introduced by your patch > 297d716f6260cc9421d971b124ca196b957ee458 > > The problem appears to be that bq27x00_powersupply_init() calls > power_supply_register_no_ws() and > sets di->bat *after* return. The old code did pass an uninitialized struct > pointer. > > Now for reasons I don’t understand, the power_supply_register_no_ws() appears > to call > uevent related stuff which in turn calls bq27x00_battery_get_property() > before di->bat > is properly initialized. > > I have checked with printk in bq27x00_battery_get_property() that di>bat == > NULL in this case and > right before we see the segfault. > > The old code simply did pass a zeroed struct power_supply and perhaps > initialized its components > during registration. > > Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer > dereference but > I don’t know what it does to the uevent and if it restores previous operation. > > It could have been that it was for good purpose that > power_supply_register_no_ws() did not return > by value, but by reference to the di->bat struct: > > - ret = power_supply_register_no_ws(di->dev, &di->bat, NULL); > + di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg); > > So that code called within the context of power_supply_register_no_ws() could > already > refer to initialized di->bat.
Indeed this issue was not present in previous design however I think the architecture was still error-prone. Starting from driver's probe: - some_driver_probe() - power_supply_register(&psy) - device_add() - kobject_uevent() - notify userspace - power_supply_uevent() - power_supply_show_property() - power_supply_get_property() - some_driver_get_property() So before probe() ends the power supply core calls driver's get_property(). In that time the driver internal structures may not be ready yet, because the probe did not end. The get_property() could access some registers or other core functions which will be ready after power_supply_register() call. For example the bq27x00_battery_get_property() accesses delayed work which could be (by mistake) not yet initialized. I looked at other dev_uevent implementations and almost all of them do not call back the driver. Of course this does not change that I introduced the issue and I feel bad about it. I got some ideas for resolving it: 1. Fix bq27x00_battery and maybe others so they won't access the 'psy' member of its state container. This does not solve the issue from architectural point of view - still some uninitialised data may be accessed because probe() is in progress. However this would be the fastest and the least intrusive. 2. Remove calls to get_property() (and other functions provided by driver) from power_supply_uevent(). Unfortunately this may break userspace which expects that some data will be present in uevent. 3. Change the API to the previous convention, which you pointed as a remedy: ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg); This also won't solve the issue from 1. that uevent will be called before probe ends. 4. Block the power_supply_uevent() from calling the get_property() functions until device_add() finishes. This would solve this issue but first uevent messages (from adding device) won't contain all of device data (just like in solution no 2.) and this won't solve other race - someone may call sysfs show() on created device nodes and thus access the get_property() before probe finishes. Any ideas? Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/