On Mon, Nov 2, 2020 at 2:27 PM Benny Zlotnik <bzlot...@redhat.com> wrote:

Issues like this belong to oVirt devel mailing list.

> looks like live merge failed[1]:
> 2020-11-01 10:31:49,903+0100 ERROR (periodic/0) [virt.vm] 
> (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') Unable to get watermarks for 
> drive vdb: invalid argument: invalid path 
> /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5
>  not assigned to domain (vm:1213)

This may be a bug in vdsm live merge flow, tring to monitor a volume
after the volume
was already removed, or it may be libvirt/qemu bug.

> ...
> 2020-11-01 10:31:53,138+0100 ERROR (jsonrpc/1) [virt.vm] 
> (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') merge: libvirt does not support 
> volume chain monitoring. Unable to perform live merge. drive: vdb, alias: 
> ua-fbb11a06-b8ef-4078-9530-978e7ca8ea0b, chains: {} (vm:5411)
>
> libvirt logs report:
> ...
> 2020-11-01 09:30:50.021+0000: 40137: error : virProcessRunInFork:1161 : 
> internal error: child reported (status=125):
> 2020-11-01 09:30:50.025+0000: 40137: error : virProcessRunInFork:1161 : 
> internal error: child reported (status=125): internal error: child reported 
> (status=125):
> 2020-11-01 09:30:50.025+0000: 40137: warning : 
> qemuDomainSnapshotDiskUpdateSource:15582 : Unable to move disk metadata on vm 
> vm0
> 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : 
> internal error: unable to execute QEMU command 'blockdev-del': Node 
> libvirt-6-format is in use
> 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : 
> internal error: unable to execute QEMU command 'blockdev-del': Block device 
> libvirt-6-storage is in use
> 2020-11-01 09:31:45.540+0000: 40134: error : qemuMonitorJSONCheckError:412 : 
> internal error: unable to execute QEMU command 'blockdev-del': Node 
> 'libvirt-7-format' is busy: node is used as backing hd of 'libvirt-6-format'
> 2020-11-01 09:31:45.541+0000: 40134: error : qemuMonitorJSONCheckError:412 : 
> internal error: unable to execute QEMU command 'blockdev-del': Block device 
> libvirt-7-storage is in use
> 2020-11-01 09:31:45.900+0000: 40133: error : qemuDomainGetBlockInfo:12272 : 
> invalid argument: invalid path 
> /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5
>  not assigned to domain

These smell like libvirt/qemu bug.

Is this reproducible with RHEL 8.3?

> Looks like the issue previously discussed in "[rhev-devel] Live storage 
> migration instability in OST" two months ago has resurfaced
>
>
>
> Another issue seems to be the removal of the source disk:
> 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.StorageDomainManifest] 
> removed image dir: 
> /rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/3ca0e492-45f2-4383-b149-439043408bce/images/_remove_me_fbb11a06-b8ef-4078-9530-978e7ca8ea0b
>  can't be removed (fileSD:258)
> 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.TaskManager.Task] 
> (Task='70db80c2-076a-4ba1-a65d-821e6b5fe52c') Unexpected error (task:880)
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 251, 
> in purgeImage
>     self.oop.os.rmdir(toDelDir)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 
> 238, in rmdir
>     self._iop.rmdir(path)
>   File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 550, in 
> rmdir
>     return self._sendCommand("rmdir", {"path": path}, self.timeout)
>   File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in 
> _sendCommand
>     raise OSError(errcode, errstr)
> OSError: [Errno 39] Directory not empty
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in 
> _run
>     return fn(*args, **kargs)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in 
> run
>     return self.cmd(*self.argslist, **self.argsdict)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, 
> in wrapper
>     return method(self, *args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1947, in 
> purgeImage
>     domain.purgeImage(sdUUID, imgUUID, volsByImg, discard)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 855, in 
> purgeImage
>     self._manifest.purgeImage(sdUUID, imgUUID, volsImgs, discard)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 259, 
> in purgeImage
>     raise se.ImageDeleteError("%s %s" % (imgUUID, str(e)))
> vdsm.storage.exception.ImageDeleteError: Could not remove all image's 
> volumes: ('fbb11a06-b8ef-4078-9530-978e7ca8ea0b [Errno 39] Directory not 
> empty',)
>
> But it's unclear what the leftover is

Maybe be leftover from previous failed LSM. I think we need a better
error message
here, it should list the files in the non-empty directory.

> [1] 
> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12782/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>
>
> On Mon, Nov 2, 2020 at 1:44 PM Steven Rosenberg <srose...@redhat.com> wrote:
>>
>> Dear Benny,
>>
>> Thank you for your response.
>>
>> Here is the timeout engine log from one of the ps 45 failures [1].
>>
>> It seems like this timeout is related to the engine failing and the ost 
>> scripts are not designed to detect the fail, thus timing out:
>>
>> 2020-11-01 10:31:55,175+01 ERROR 
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] 
>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) 
>> [live_storage_migration] Command id: '60b3f7fc-93db-48b2-82a1-8a93c47e18e1 
>> failed child command status for step 'MERGE_STATUS'
>>
>>
>>
>>
>> [1] 
>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12796/artifact/check-patch.basic_suite_master.el7.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
>>
>> With Best Regards.
>>
>> Steven.
>>
>>
>>
>> On Mon, Nov 2, 2020 at 12:51 PM Benny Zlotnik <bzlot...@redhat.com> wrote:
>>>
>>> Can you link to the relevant engine/vdsm logs?
>>> The timeout in the tests indicates that the desired state wasn't
>>> reached so test logs don't provide the information about what exactly
>>> happened
>>>
>>> On Sun, Nov 1, 2020 at 5:10 PM Steven Rosenberg <srose...@redhat.com> wrote:
>>> >
>>> > Dear virt-devel,
>>> >
>>> > We are currently experiencing many timeout failures in various patch sets 
>>> > for a gerrit issue 11395 [1].
>>> >
>>> > The timeouts occur intermittently and seem to be unrelated to the changes 
>>> > which are only in the 004 module [2] and should have only affected VM1 / 
>>> > Disk1.
>>> >
>>> > We could use some advice on addressing these issues as well as a review 
>>> > of the patch to ensure we can move this patch forward. The patch sets and 
>>> > relevant timeouts are as follows:
>>> >
>>> > PS 40:
>>> >
>>> > test_live_storage_migration – test 004 [3]
>>> >
>>> > PS 41:
>>> >
>>> > on test_verify_engine_backup – test 002 [4]
>>> >
>>> > PS 43:
>>> >
>>> > on test_virtual_machines - test 100 [5]
>>> >
>>> > PS 45:
>>> >
>>> >  on test_live_storage_migration – 004 [6]
>>> >
>>> >
>>> >
>>> >
>>> > [1] https://gerrit.ovirt.org/#/c/111395/
>>> > [2] basic-suite-master/test-scenarios/004_basic_sanity.py
>>> > [3] 
>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12628/testReport/junit/basic-suite-master.test-scenarios/004_basic_sanity/Invoking_jobs___check_patch_basic_suite_master_el8_x86_64___test_live_storage_migration/
>>> > [4] 
>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12659/testReport/junit/basic-suite-master.test-scenarios/002_bootstrap/test_verify_engine_backup/
>>> > [5] 
>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12744/testReport/junit/basic-suite-master.test-scenarios/100_basic_ui_sanity/Invoking_jobs___check_patch_basic_suite_master_el7_x86_64___test_virtual_machines_chrome_/
>>> > [6] 
>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/12782/testReport/junit/basic-suite-master.test-scenarios/004_basic_sanity/Invoking_jobs___check_patch_basic_suite_master_el8_x86_64___test_live_storage_migration/
>>> >
>>> > With Best Regards.
>>> >
>>> > Steven
>>> >
>>> >
>>>
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/SA3XA65UVR5B7XXNETPUCI4227V322M7/

Reply via email to