On Fri, 27 Jul 2012, Florian Bilek wrote:
> I can confirm that the udev settle is returning always with zero. I have
> the timeout set to even to 60 sec and an exit if the device node is
> available. And that is stlll not enough because udev exits. Without exit I
> had set the timeout to 30 secs. Only running the loop two times the chance
> gets high to succeed.
>
> As I have written in my first mail, I encountered that problem already long
> time ago. I found always workarounds but with every kernel update the
> chance is there that the race condition is coming back.
>
> In case it seems that there isn't a reliable check that the device is
> really useable. It is always a gamble if your procedure succeeds or not.

I found the other mail thread mentioned here and have an assumption of
what went wrong. I blame the --exit-if-exists option of udev settle (which
should be ok in most cases but is not if you want to use dasdfmt
afterwards). For the sake of argument let's assume that using udev settle
like that would be the same as:
if [ ! -e /dev/dasdx ] ;then
  udevadm settle
fi
So sometimes you just wait for udev calling mknod but you don't wait for
udev finishing the other stuff it does with this device.

Since dasdfmt does the low-level formating stuff it tries to make sure
it's the only user of the device. But in your case it looks like
sometimes it's not the only user and it's likely that's because some
worker of udev is not finished and still has a file descriptor to
this device node opened.

So I still think it is sufficient to do:
chccwdev -e xxx ;udevadm settle ;dasdfmt xxx

Regards,
Sebastian

>
> Since my work on the clone procedure I see that the critical path is the
> amount of steps necessary to make one device useable:
>
> 1. attach it (vmcp)
> 2. vary it online (chccwdev -e)
> 3. format it (dasdfmt)
> 4. partition it (fdasd)
> 5. make the file system
>
> The most critical steps are 2 and 3. I see it in my exec that most times
> both steps are failing and need to be rerun. This is happening with the
> actual kernel version on SLES 11 SP2. SP1 had usually only one of both
> steps failing.
>
> On SLES 10 the problem was that the partition node didn't show up or after
> a certain amount of (successful) chccwdevs the kernel could not bring the
> device online any more and an reboot of the guest was required.
>
> sync or other tricks I know do not really solve the problem and also udev
> is disappointing since it tells that every this ins save which is not the
> case. I used dasd_config from SLES 11 but it didn't solve the problem
> either.
>
> The chances are high that between one of these steps the situation arises.
> Having the 30 seconds fixed delay between all the steps makes the process
> including formatting and creation of the filesystem quite long. Personally
> I would say unacceptable long.
>
> We use here an IBM z/10 with DS 8700 for the disks. z/VM is 6.2 on latest
> RSU  So it is original and fast equipment and there the situation still
> appears.
>
> Kind regards,
> Florian
>
>
>
>
>
> On Fri, Jul 27, 2012 at 8:23 PM, David Boyes <[email protected]> wrote:
>
> > > I believe that's the piece that's missing (for most people).  I can
> > easily
> > > reproduce the problem on my SLES11 SP2 system with this script:
> > > vmcp define vfb-512 302 2000
> > > date +%H:%M:%S.%N
> > > chccwdev -e 0.0.0302
> > > mkswap /dev/disk/by-path/ccw-0.0.0302-part1
> >
> > Yeah, that's pretty much guaranteed to fail. If you insert a 'udevadm
> > settle' after the 'chccwdev -e', I still get a failure about 3 times out of
> >  100 attempts, though.
> >
> > Alan may be on to something with the timeout value for udev for that type
> > of device.
> >
> > ----------------------------------------------------------------------
> > For LINUX-390 subscribe / signoff / archive access instructions,
> > send email to [email protected] with the message: INFO LINUX-390 or
> > visit
> > http://www.marist.edu/htbin/wlvindex?LINUX-390
> > ----------------------------------------------------------------------
> > For more information on Linux on System z, visit
> > http://wiki.linuxvm.org/
> >
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> ----------------------------------------------------------------------
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>
>

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to