On Mon, Apr 05, 2021 at 04:47:30PM +0300, Sam Eiderman wrote: > We also looked at udev settle call points in the logs and it seems that it is > called a lot of times before. > > The bug I mentioned is > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616689 and they > also discuss that maybe udev settle is not working as intended.
So I don't know, but it should be relatively easy to tell. Firstly you can modify appliance/init to add very verbose debugging to udev. Uncomment --debug here: https://github.com/libguestfs/libguestfs/blob/b18ac489db76a700f2168ae6eb64e9d450613a27/appliance/init#L107 Additionally or instead you could modify daemon/utils.c to do “ls -lR /dev/” before and after the udevadm settle command, which should show if the additional device nodes are present before and/or after the settle command. That would be a pretty good way to tell if udevadm settle is having the effect we think it should. > The kernel version of the appliance (as can be seen in the log) is 4.19 > > > Collecting the full logs is the right approach to diagnosing this. > > I added the full log for the first failure logs, I think we can see from there > that udev settle is called but the file does not exist yet. Do you have the full logs from the second case? > We thought that maybe if we explicitly add the following logic right after > g.launch() it might help: > > 1. For each device returned by: lvm 'lvs' '-o' 'vg_name,lv_name' '-S' > 'lv_role= > public && lv_skip_activation!=yes' '--noheadings' '--separator' '/' > 1.1. stat the device /dev/vg/lv > 1.2. if stat fails on device does not exist - wait > 1.3. Go back to 1 > > If we wait for too long, relaunch guestfs. It'd be a bit of a hack. Probably better to try to work out what's going wrong first of all. It should be possible to tell from the kernel, udev and libguestfs logs. Rich. > However it would be nicer to maybe implement this inside guestfs.launch() > itself > > Sam > > > On Mon, Apr 5, 2021 at 3:45 PM Richard W.M. Jones <rjo...@redhat.com> wrote: > > On Mon, Apr 05, 2021 at 02:47:51PM +0300, Sam Eiderman wrote: > > Hi, > > > > We are experiencing very rare LVM failures - 2 failures so far, in > > different OSs, in different libguestfs functions. > > > > The first failure is inspect_os() not finding the root operating > > system on rhel7.4. > > LVM volumes are returned by lvm command but files under /dev do not > exist > (yet?) > > > > Second failure is in is_lv() - is_lv() successfully enumerates all lvm > > volumes but then internal stat() command fails again on /dev file > > since it does not exist (yet?) (rhel8.0) > > > > All of our tests run in parallel, 1 guestfs instance per core on a 32 > > core machine and they run on GCP (nested virtualization). > > > > What we think that is happening here is that libguestfs' appliance is > > booting very somewhat slower than usual and that the links to some > > devices do not appear yet (even after multiple seconds). > > We found this old issue that might be connected to this behavior (in > > some way): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616689 > > I wonder if "udevadm settle" is not working? The daemon will use this > command at various times in order to ensure that all preceeding udev > messages have been processed and all /dev changes have been made. > > It is called once at appliance boot: > > https://github.com/libguestfs/libguestfs/blob/ > b18ac489db76a700f2168ae6eb64e9d450613a27/appliance/init#L109 > > And throughout the daemon code: > > https://github.com/libguestfs/libguestfs/blob/ > b18ac489db76a700f2168ae6eb64e9d450613a27/daemon/utils.c#L732 > > $ git grep 'udev_settle ()' -- daemon > daemon/blockdev.c: udev_settle (); > daemon/cryptsetup.ml: udev_settle () > daemon/cryptsetup.ml: udev_settle () > daemon/file.c: udev_settle (); > daemon/guestfsd.c: udev_settle (); > daemon/hotplug.c: udev_settle (); > [etc etc] > > It could be that udev_settle is not being called at the right points, > or is not working in the way we understand. > > ... > > Short second failure logs (is_lv() only) - notice that is_lv() is > > invoked on /dev/vg_myvg/lv_var but it fails due to a problem in > > /dev/rhel/swap not existing) > > > > 2021-03-07 10:58:53 T libguestfs - 0 - enter - is_lv > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: => > > aug_get (0x13) took 0.00 secs > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: <= is_lv > > (0x108) request length 64 bytes > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - commandrvf: > > stdout=n stderr=y flags=0x0 > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - commandrvf: udevadm > > --debug settle -E /dev/vg_myvg/lv_var > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm 'lvs' > > '-o' 'vg_name,lv_name' '-S' 'lv_role=public && > > lv_skip_activation!=yes' '--noheadings' '--separator' '/' > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm returned > 0 > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - command: lvm: stdout: > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - rhel/root > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - rhel/swap > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - vg_myvg/lv_var > > 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: error: > > stat: /dev/rhel/swap: No such file or directory > > You might want to look earlier in this log to see if udevadm settle > was called between the LVs being activated and this API function. If > it was not being called then possibly we need to insert a call after > activation. If it was being called then perhaps udev settle is not > working the way we understand it. > > Collecting the full logs is the right approach to diagnosing this. > > The only other issue I can think of is the change in kernel PCI device > enumeration code (starting in Linux 5.6, > https://bugzilla.redhat.com/show_bug.cgi?id=1804207). I suppose in > theory the underlying devices might not be ready at all before we run > udev settle in the appliance. However I have not seen this actually > happen. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/ > ~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > libguestfs lets you edit virtual machines. Supports shell scripting, > bindings from many languages. http://libguestfs.org > > -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs