Dear all, I can confirm that the udev settle is returning always with zero. I have the timeout set to even to 60 sec and an exit if the device node is available. And that is stlll not enough because udev exits. Without exit I had set the timeout to 30 secs. Only running the loop two times the chance gets high to succeed.
As I have written in my first mail, I encountered that problem already long time ago. I found always workarounds but with every kernel update the chance is there that the race condition is coming back. In case it seems that there isn't a reliable check that the device is really useable. It is always a gamble if your procedure succeeds or not. Since my work on the clone procedure I see that the critical path is the amount of steps necessary to make one device useable: 1. attach it (vmcp) 2. vary it online (chccwdev -e) 3. format it (dasdfmt) 4. partition it (fdasd) 5. make the file system The most critical steps are 2 and 3. I see it in my exec that most times both steps are failing and need to be rerun. This is happening with the actual kernel version on SLES 11 SP2. SP1 had usually only one of both steps failing. On SLES 10 the problem was that the partition node didn't show up or after a certain amount of (successful) chccwdevs the kernel could not bring the device online any more and an reboot of the guest was required. sync or other tricks I know do not really solve the problem and also udev is disappointing since it tells that every this ins save which is not the case. I used dasd_config from SLES 11 but it didn't solve the problem either. The chances are high that between one of these steps the situation arises. Having the 30 seconds fixed delay between all the steps makes the process including formatting and creation of the filesystem quite long. Personally I would say unacceptable long. We use here an IBM z/10 with DS 8700 for the disks. z/VM is 6.2 on latest RSU So it is original and fast equipment and there the situation still appears. Kind regards, Florian On Fri, Jul 27, 2012 at 8:23 PM, David Boyes <[email protected]> wrote: > > I believe that's the piece that's missing (for most people). I can > easily > > reproduce the problem on my SLES11 SP2 system with this script: > > vmcp define vfb-512 302 2000 > > date +%H:%M:%S.%N > > chccwdev -e 0.0.0302 > > mkswap /dev/disk/by-path/ccw-0.0.0302-part1 > > Yeah, that's pretty much guaranteed to fail. If you insert a 'udevadm > settle' after the 'chccwdev -e', I still get a failure about 3 times out of > 100 attempts, though. > > Alan may be on to something with the timeout value for udev for that type > of device. > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO LINUX-390 or > visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
