>>> On 7/27/2012 at 01:26 PM, Michael MacIsaac <[email protected]> wrote: > Nice test case! I modified it a bit :))
I have two SLES10 SP4 systems. One is on a fairly loaded box, and one is on a fairly idle box. On the loaded box, the failure rate of the chccwdev -e command was fairly high, even with a 2 second sleep between the vmcp define command and the chccwdev. On the fairly idle box, I never saw a chccwdev -e failure (but I did get one chccwdev -d failure). In both cases, I iterated over the script 1000 time, with the following results Idle SLES10 SP4: 492 cases with udevsettle - 0 failures = 100% successes 507 cases without udevsettle - 506 failures = 99.8% failures Busy SLES10 SP4: 154 chccwdev -e failures 1 chccwdev -d failure = 15.5% total failures. Note that the chccwdev -d command _always_ follows a udevsettle command. 413 cases with udevsettle - 0 failures = 100% successes 433 cases without udevsettle - 423 failures = 97.7% failures In one case, the chccwdev -e failure was temporary. In all other 153 cases, the entry in /sys/bus/ccw/devices/ was not created, even after 3 seconds of waiting. The message from chccwdev -e was "0.0.0302 is not a channel device." That's while running the script in a loop. If I run the script manually each time, I tend to see a couple of different failures, although much less frequently. The most common is the entry in /sys/bus/ccw/devices/ does show up in a few seconds (3 or less). Presumably, this case is taken care of by udevsettle. The next most common (which is far less common than the first failure mode) is that the entry in /sys/bus/ccw/devices/ is never created, nor is the entry in /sys/devices/ccs0/. In some rare cases, when the chccwdev -d command is issued, the entry in /sys/devices/css0/0.0.????/ is removed, but the /sys/bus/ccw/devices/0.0.0302/ is not, leading to a broken symbolic link. If I redefine the device, I can use it again, but disabling it and detaching it leaves the danglnig symlink. The only thing that seems to clear that up is a reboot. I don't know if leaving it alone would cause any problems further on down the road or not. Idle SLES11 SP2 502 cases with udevadm settle - 0 failures, 100% successes 498 cases without udevadm settle 23 chccwdev -e failures = 11.5% 97 udevsettle cases = 100% success 80 no udevsettle = 100% failure. Given the difference in results between a busy and idle SLES10 SP4 system, and the fact that I don't have a SLES11 SP2 guest on a busy system, David's rate of about 3% failures with udevadm settle can't be ignored. I doubt very much it's the udevadm settle timeout value. The default is 180 seconds for _everything_. I think at this point, I need to state the obvious: if a customer (or business partner) experiencing this problem has a support contract with SUSE, Red Hat, or IBM and opens up a support request, there is likely to be more effort into figuring out a fix. Mark Post ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
