Re: [Nut-upsdev] a nasty kernel oops

Charles Lepple Sun, 16 Jan 2011 00:20:27 -0800

On Jan 14, 2011, at 9:31 PM, Alfred Ganz wrote:

Charles,


Here is some more insight into my problem.
* I am now able to get a crash on a virtual machine, so life has
  become a bit easier


kernel oops, usbhid-ups crash, or simply a failure to launch usbhid-ups?

* disabling the UPS, then immediately after re-enabling it, the first
  libhid-detach-device fails after about 10 sec, with:
        hid_force_open failed with return code 7.
  i.e. no device has been found.

Following the first libhid-detach-device immediately by severalmore,

  they all fail the same way, but without another 10 sec delay.
  Finally, adding a sleep 1 followed by another libhid-detach-device
  will succeed.
* disabling the UPS, then waiting 20 sec after re-enabling it, the
  first libhid-detach-device will succeed.
  Note, I wasn't able to reduce this delay significantly, so it seems
        that the total delay can be smaller when doing the above
        failing operations.
* The same behavior occurs when using lsusb -d instead of the above
  libhid-detach-device.

* usbhid-ups crashes if the last preceeding libhid-detach-devicefails,

  but it will not fail if there is a successful libhid-detach-device
  preceeding it, or if there is a longer inactive delay.

Unfortunately, the timing is for the virtual machine, and I don'texpect

things to be similar on the real machine, not to speak of the boot
context with other devices present.

As you suspected, it looks like usbhid-ups crashes if things have not
reached quiescence or some other kind of availability. However, I have
no idea how at boot time adding an active USB device can achieve this
(or maybe achieve it much more quickly).

It would of course be nice to make usbhid-ups have a builtin methodfor

detecting such a state and at the same time be able to detect the
absence of the device in question. However, I think the appropriate
thing is to determine such a method outside of usbhid-ups first. If at
all possible, I would prefer to do this with some shell script, but if
push comes to shuff, I might have to resort to some C code as well.

I don't want to downplay the significance of the problem on your end,but it is really up to the kernel to protect itself from raceconditions and crashes caused by userspace applications accessingdevices. To that end, I agree that something should be done outsideusbhid-ups.

We've had a few discussions on how the drivers should deal with USBdevices which are not there. My take on this is that we will try toreconnect if it is a temporary disappearance, but we won't retry forlong at startup. I personally think that if the device node is notready by the time NUT starts, either NUT is being started too early,or the device is not to be trusted with something as critical asnotification of power events.

That said, the HAL-style drivers are started when the device isplugged in. While that might be nice, I don't think that's a faircomparison because they tend to provide information about the powersituation, rather than being part of a reliable monitoring andshutdown system.

Any advice on what might work would of course be much appreciated.

One workaround would be to patch the kernel to blacklist the UPS fromthe kernel raw HID driver. Of course, this doesn't play well withprebuilt binary kernels.

Along these lines, it should be possible to blacklist the kernel HIDmodule which has attached to the UPS. I haven't followed this portionof the kernel much lately (and all bets are off in RedHat kernels),but with any luck, it might be separate from the keyboard/mouse HID-to-input-layer module.

A less intrusive way might be to watch the /dev space for the nodecorresponding to the HID interface, and wait a few seconds after thatappears before detaching.

Thanks, AG

P.S. What happened to the mail server at lists.alioth.debian.org

We haven't heard any updates, and I don't see any tracker itemsexplaining what happened, but it seems to be back now.


_______________________________________________
Nut-upsdev mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/nut-upsdev

Re: [Nut-upsdev] a nasty kernel oops

Reply via email to