>>> On 7/27/2012 at 01:26 PM, Michael MacIsaac <[email protected]> wrote: 
> Nice test case! I modified it a bit :))

I have two SLES10 SP4 systems.  One is on a fairly loaded box, and one is on a 
fairly idle box.  On the loaded box, the failure rate of the chccwdev -e 
command was fairly high, even with a 2 second sleep between the vmcp define 
command and the chccwdev.  On the fairly idle box, I never saw a chccwdev -e 
failure (but I did get one chccwdev -d failure).  In both cases, I iterated 
over the script 1000 time, with the following results
Idle SLES10 SP4:
492 cases with udevsettle - 0 failures = 100% successes
507 cases without udevsettle - 506 failures = 99.8% failures

Busy SLES10 SP4:
154 chccwdev -e failures
1 chccwdev -d failure = 15.5% total failures.  Note that the chccwdev -d 
command _always_ follows a udevsettle command.
413 cases with udevsettle - 0 failures = 100% successes
433 cases without udevsettle - 423 failures = 97.7% failures

In one case, the chccwdev -e failure was temporary.  In all other 153 cases, 
the entry in /sys/bus/ccw/devices/ was not created, even after 3 seconds of 
waiting.  The message from chccwdev -e was "0.0.0302 is not a channel device."  
That's while running the script in a loop.

If I run the script manually each time, I tend to see a couple of different 
failures, although much less frequently.  The most common is the entry in 
/sys/bus/ccw/devices/ does show up in a few seconds (3 or less).  Presumably, 
this case is taken care of by udevsettle.  The next most common (which is far 
less common than the first failure mode) is that the entry in 
/sys/bus/ccw/devices/ is never created, nor is the entry in /sys/devices/ccs0/. 
 In some rare cases, when the chccwdev -d command is issued, the entry in 
/sys/devices/css0/0.0.????/ is removed, but the /sys/bus/ccw/devices/0.0.0302/ 
is not, leading to a broken symbolic link.  If I redefine the device, I can use 
it again, but disabling it and detaching it leaves the danglnig symlink.  The 
only thing that seems to clear that up is a reboot.  I don't know if leaving it 
alone would cause any problems further on down the road or not.

Idle SLES11 SP2
502 cases with udevadm settle - 0 failures, 100% successes
498 cases without udevadm settle 
23 chccwdev -e failures = 11.5%
97 udevsettle cases = 100% success
80 no udevsettle = 100% failure.


Given the difference in results between a busy and idle SLES10 SP4 system, and 
the fact that I don't have a SLES11 SP2 guest on a busy system,  David's rate 
of about 3% failures with udevadm settle can't be ignored.

I doubt very much it's the udevadm settle timeout value.  The default is 180 
seconds for _everything_.

I think at this point, I need to state the obvious: if a customer (or business 
partner) experiencing this problem has a support contract with SUSE, Red Hat, 
or IBM and opens up a support request, there is likely to be more effort into 
figuring out a fix.


Mark Post

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to