On Mon, Nov 2, 2020 at 2:27 PM Benny Zlotnik <bzlot...@redhat.com> wrote:
Issues like this belong to oVirt devel mailing list. > looks like live merge failed[1]: > 2020-11-01 10:31:49,903+0100 ERROR (periodic/0) [virt.vm] > (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') Unable to get watermarks for > drive vdb: invalid argument: invalid path > /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5 > not assigned to domain (vm:1213) This may be a bug in vdsm live merge flow, tring to monitor a volume after the volume was already removed, or it may be libvirt/qemu bug. > ... > 2020-11-01 10:31:53,138+0100 ERROR (jsonrpc/1) [virt.vm] > (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') merge: libvirt does not support > volume chain monitoring. Unable to perform live merge. drive: vdb, alias: > ua-fbb11a06-b8ef-4078-9530-978e7ca8ea0b, chains: {} (vm:5411) > > libvirt logs report: > ... > 2020-11-01 09:30:50.021+0000: 40137: error : virProcessRunInFork:1161 : > internal error: child reported (status=125): > 2020-11-01 09:30:50.025+0000: 40137: error : virProcessRunInFork:1161 : > internal error: child reported (status=125): internal error: child reported > (status=125): > 2020-11-01 09:30:50.025+0000: 40137: warning : > qemuDomainSnapshotDiskUpdateSource:15582 : Unable to move disk metadata on vm > vm0 > 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : > internal error: unable to execute QEMU command 'blockdev-del': Node > libvirt-6-format is in use > 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : > internal error: unable to execute QEMU command 'blockdev-del': Block device > libvirt-6-storage is in use > 2020-11-01 09:31:45.540+0000: 40134: error : qemuMonitorJSONCheckError:412 : > internal error: unable to execute QEMU command 'blockdev-del': Node > 'libvirt-7-format' is busy: node is used as backing hd of 'libvirt-6-format' > 2020-11-01 09:31:45.541+0000: 40134: error : qemuMonitorJSONCheckError:412 : > internal error: unable to execute QEMU command 'blockdev-del': Block device > libvirt-7-storage is in use > 2020-11-01 09:31:45.900+0000: 40133: error : qemuDomainGetBlockInfo:12272 : > invalid argument: invalid path > /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5 > not assigned to domain These smell like libvirt/qemu bug. Is this reproducible with RHEL 8.3? > Looks like the issue previously discussed in "[rhev-devel] Live storage > migration instability in OST" two months ago has resurfaced > > > > Another issue seems to be the removal of the source disk: > 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.StorageDomainManifest] > removed image dir: > /rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/3ca0e492-45f2-4383-b149-439043408bce/images/_remove_me_fbb11a06-b8ef-4078-9530-978e7ca8ea0b > can't be removed (fileSD:258) > 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.TaskManager.Task] > (Task='70db80c2-076a-4ba1-a65d-821e6b5fe52c') Unexpected error (task:880) > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 251, > in purgeImage > self.oop.os.rmdir(toDelDir) > File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line > 238, in rmdir > self._iop.rmdir(path) > File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 550, in > rmdir > return self._sendCommand("rmdir", {"path": path}, self.timeout) > File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in > _sendCommand > raise OSError(errcode, errstr) > OSError: [Errno 39] Directory not empty > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in > _run > return fn(*args, **kargs) > File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in > run > return self.cmd(*self.argslist, **self.argsdict) > File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, > in wrapper > return method(self, *args, **kwargs) > File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1947, in > purgeImage > domain.purgeImage(sdUUID, imgUUID, volsByImg, discard) > File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 855, in > purgeImage > self._manifest.purgeImage(sdUUID, imgUUID, volsImgs, discard) > File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 259, > in purgeImage > raise se.ImageDeleteError("%s %s" % (imgUUID, str(e))) > vdsm.storage.exception.ImageDeleteError: Could not remove all image's > volumes: ('fbb11a06-b8ef-4078-9530-978e7ca8ea0b [Errno 39] Directory not > empty',) > > But it's unclear what the leftover is Maybe be leftover from previous failed LSM. I think we need a better error message here, it should list the files in the non-empty directory. > [1] > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12782/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > > > On Mon, Nov 2, 2020 at 1:44 PM Steven Rosenberg <srose...@redhat.com> wrote: >> >> Dear Benny, >> >> Thank you for your response. >> >> Here is the timeout engine log from one of the ps 45 failures [1]. >> >> It seems like this timeout is related to the engine failing and the ost >> scripts are not designed to detect the fail, thus timing out: >> >> 2020-11-01 10:31:55,175+01 ERROR >> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) >> [live_storage_migration] Command id: '60b3f7fc-93db-48b2-82a1-8a93c47e18e1 >> failed child command status for step 'MERGE_STATUS' >> >> >> >> >> [1] >> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12796/artifact/check-patch.basic_suite_master.el7.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log >> >> With Best Regards. >> >> Steven. >> >> >> >> On Mon, Nov 2, 2020 at 12:51 PM Benny Zlotnik <bzlot...@redhat.com> wrote: >>> >>> Can you link to the relevant engine/vdsm logs? >>> The timeout in the tests indicates that the desired state wasn't >>> reached so test logs don't provide the information about what exactly >>> happened >>> >>> On Sun, Nov 1, 2020 at 5:10 PM Steven Rosenberg <srose...@redhat.com> wrote: >>> > >>> > Dear virt-devel, >>> > >>> > We are currently experiencing many timeout failures in various patch sets >>> > for a gerrit issue 11395 [1]. >>> > >>> > The timeouts occur intermittently and seem to be unrelated to the changes >>> > which are only in the 004 module [2] and should have only affected VM1 / >>> > Disk1. >>> > >>> > We could use some advice on addressing these issues as well as a review >>> > of the patch to ensure we can move this patch forward. The patch sets and >>> > relevant timeouts are as follows: >>> > >>> > PS 40: >>> > >>> > test_live_storage_migration – test 004 [3] >>> > >>> > PS 41: >>> > >>> > on test_verify_engine_backup – test 002 [4] >>> > >>> > PS 43: >>> > >>> > on test_virtual_machines - test 100 [5] >>> > >>> > PS 45: >>> > >>> > on test_live_storage_migration – 004 [6] >>> > >>> > >>> > >>> > >>> > [1] https://gerrit.ovirt.org/#/c/111395/ >>> > [2] basic-suite-master/test-scenarios/004_basic_sanity.py >>> > [3] >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12628/testReport/junit/basic-suite-master.test-scenarios/004_basic_sanity/Invoking_jobs___check_patch_basic_suite_master_el8_x86_64___test_live_storage_migration/ >>> > [4] >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12659/testReport/junit/basic-suite-master.test-scenarios/002_bootstrap/test_verify_engine_backup/ >>> > [5] >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12744/testReport/junit/basic-suite-master.test-scenarios/100_basic_ui_sanity/Invoking_jobs___check_patch_basic_suite_master_el7_x86_64___test_virtual_machines_chrome_/ >>> > [6] >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12782/testReport/junit/basic-suite-master.test-scenarios/004_basic_sanity/Invoking_jobs___check_patch_basic_suite_master_el8_x86_64___test_live_storage_migration/ >>> > >>> > With Best Regards. >>> > >>> > Steven >>> > >>> > >>> _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/SA3XA65UVR5B7XXNETPUCI4227V322M7/