Here are a few theories as to why this is happening.  If there are
kernel folks reading feel free to correct or chime in:

1.) In RHEL 5 (kernel 2.6.18), register_netdevice() calls might_sleep().
This is a change from RHEL 4 (2.6.9) where might_sleep() was not called.
This may be related because the top of the stack (most recently called
function) is an interrupt routine.

2.) In kernel 2.6.18, drivers/s390/net/lcs.c, lcs_new_device():2160, the
function netif_carrier_on() is called.  This was not called in the 2.6.9
lcs code.  I can't find the bug report that necessitated this change,
but perhaps this introduced a regression.

3.) In drivers/s390/net/lcs.c, in lcs_recover(), there is:
[snip]
rc = __lcs_shutdown_device(gdev, 1);
rc = lcs_new_device(gdev);
[..]

This makes me wonder if there is a possible race condition since the
device is destroyed and recreated right after each other, and your crash
is in lcs_new_device() after ultimately attempting to check if the sysfs
group exists (maybe it's still lingering from __lcs_shutdown_device?).
Note the 2nd argument to __lcs_shutdown_device() is 1.  Looking at
__lcs_shutdown_device(), it does lcs_wait_for_threads() only when the
2nd argument is 0.  Looking through the qeth code, it appears
qeth_recover() does something similar but does call
qeth_wait_for_threads().  Perhaps lcs_recovery() should do the same
(i.e. call __lcs_shutdown_device with 0).



I don't currently have an LCS type chpid defined, but if you're able to
do some testing I could pass you a few kernels offline to test each
point above.  Also, feel free to open a bugzilla for this
(https://bugzilla.redhat.com).


--
Brad Hinson <[EMAIL PROTECTED]>
Technical Account Manager
Red Hat, Inc.

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to