Is the following log snippet, taken from [1], relevant to this thread:

 [36m  # resize_and_refresh_storage_domain:  [32mSuccess [0m (in 0:00:04) [0m
 [36m  # add_vm2_lease:  [0m [0m [0m
 [36m    * Collect artifacts:  [0m [0m [0m
 [36m        ~ [Thread-620] Copy from
lago-basic-suite-4-3-engine:/tmp/otopi* to
/dev/shm/ost/deployment-basic-suite-4.3/default/test_logs/002_bootstrap.add_vm2_lease-20190308151804/lago-basic-suite-4-3-engine/_tmp_otopi*:
 [31mERROR [0m (in 0:00:00) [0m
 [36m      - [Thread-620] lago-basic-suite-4-3-engine:  [31mERROR [0m
(in 0:00:00) [0m
 [36m    * Collect artifacts:  [31mERROR [0m (in 0:00:01) [0m
 [36m  # add_vm2_lease:  [31mERROR [0m (in 0:00:03) [0m
 [36m  # Results located at
/dev/shm/ost/deployment-basic-suite-4.3/002_bootstrap.py.junit.xml [0m
 [36m@ Run test: 002_bootstrap.py:  [31mERROR [0m (in 0:05:17) [0m
 [31mError occured, aborting
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 383, in do_run
    self.cli_plugins[args.ovirtverb].do_run(args)
  File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
184, in do_run
    self._do_run(**vars(args))
  File "/usr/lib/python2.7/site-packages/lago/utils.py", line 549, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/lago/utils.py", line 560, in wrapper
    return func(*args, prefix=prefix, **kwargs)
  File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 107,
in do_ovirt_runtest
    raise RuntimeError('Some tests failed')
RuntimeError: Some tests failed [0m


[1] 
https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-4.3_change-queue-tester/detail/ovirt-4.3_change-queue-tester/165/pipeline


On Fri, Mar 8, 2019 at 12:21 AM Nir Soffer <[email protected]> wrote:

> On Thu, Mar 7, 2019 at 8:50 PM Dafna Ron <[email protected]> wrote:
>
>> I report this issue to the list and the test owner is more then welcome
>> to respond.
>> Galit, can you please review the skiptest
>> https://gerrit.ovirt.org/#/c/98191/
>>
>
> I looked at engine and vdsm logs, and I don't think the add_vm2_lease test
> is related
> so there is no point to disable it.
>
> It seems that just after finishing resize_and_refresh_storage_domain test:
>
> 2019-02-27 13:37:21,036-0500 INFO (jsonrpc/5) [vdsm.api] START
> resizePV(sdUUID=u'34311cc1-c4d2-4cfe-88b5-dd5ad72261d3',
> spUUID=u'717575a9-7818-460d-ba3a-d5bdd8ef9ed3',
> guid=u'360014054e5952dc174c4a12b971ea45c', options=None)
> from=::ffff:192.168.201.4,43920,
> flow_id=f204cfe2-ef48-429d-ab33-e1175d0530a0,
> task_id=5908a867-5e11-474f-8119-49cfe219d4c8 (api:48)
> 2019-02-27 13:37:21,499-0500 INFO (jsonrpc/5) [vdsm.api] FINISH resizePV
> return={'size': '24293408768'} from=::ffff:192.168.201.4,43920,
> flow_id=f204cfe2-ef48-429d-ab33-e1175d0530a0,
> task_id=5908a867-5e11-474f-8119-49cfe219d4c8 (api:54)
>
> Engine try to lookup the vm lease. The lookup fails (expected since there
> is no such lease):
>
> 2019-02-27 13:37:21,968-0500 INFO  (jsonrpc/1) [vdsm.api] START
> lease_info(lease={u'sd_id': u'b370b91d-00b5-4f62-9270-ac0acd47d075',
> u'lease_id': u'3500eb82-e5e2-4e24-b41c-ea02d9f6adee'})
> from=::ffff:192.168.201.4,43920,
> flow_id=117dec74-ad59-4b12-8148-b2c130337c10,
> task_id=9c297d41-0aa7-4c74-b268-b710e666bc6c (api:48)
>
> 2019-02-27 13:37:21,987-0500 INFO  (jsonrpc/1) [vdsm.api] FINISH
> lease_info error=No such lease 3500eb82-e5e2-4e24-b41c-ea02d9f6adee
> from=::ffff:192.168.201.4,43920,
> flow_id=117dec74-ad59-4b12-8148-b2c130337c10,
> task_id=9c297d41-0aa7-4c74-b268-b710e666bc6c (api:52)
>
> On engine side, we see immediately after that:
>
> 2019-02-27 13:37:22,078-05 DEBUG
> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
> (default task-1) [117dec74-ad59-4b12-8148-b2c130337c10] method:
> getListForSpmSelection, params: [717575a9-7818-460d-ba3a-d5bdd8ef9ed3],
> timeElapsed: 12ms
>
> Which means nothing to me, but it seems that engine treat the expected
> error to lookup the lease
> as fatal error in the spm.
>
> Soon engine try to stop the spm on host-0, which make no sense:
>
> 2019-02-27 13:37:22,125-05 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default
> task-1) [117dec74-ad59-4b12-8148-b2c130337c10] SpmStopVDSCommand::Stopping
> SPM on vds 'lago-basic-suite-master-host-0', pool id
> '717575a9-7818-460d-ba3a-d5bdd8ef9ed3'
>
> At the same time, host1 has trouble to see the LUNs, and fail to connect
> to the pool:
>
> 2019-02-27 13:37:20,014-0500 INFO  (jsonrpc/4) [vdsm.api] START
> getDeviceList(storageType=3, guids=[u'360014055ad799e3a968444abdaefa323',
> u'360014054e5952dc174c4a12b971ea45c'], checkStatus=False, options={})
> from=::ffff:192.168.201.4,33922,
> flow_id=f204cfe2-ef48-429d-ab33-e1175d0530a0,
> task_id=78fb7d20-8bec-4561-9867-da4174d181a6 (api:48)
>
> 2019-02-27 13:37:20,340-0500 INFO  (jsonrpc/4) [vdsm.api] FINISH
> getDeviceList return={'devList': []} from=::ffff:192.168.201.4,33922,
> flow_id=f204cfe2-ef48-429d-ab33-e1175d
> 0530a0, task_id=78fb7d20-8bec-4561-9867-da4174d181a6 (api:54)
>
> The host does not see any LUNs.
>
> So of course it cannot see master storage domain.
>
> 2019-02-27 13:37:22,718-0500 INFO  (jsonrpc/6) [vdsm.api] FINISH
> connectStoragePool error=Cannot find master domain:
> u'spUUID=717575a9-7818-460d-ba3a-d5bdd8ef9ed3, msdUUID=3
>
> So we probably go into SPM recovery flow which take some time, and adding
> lease is just a victim of that,
> since adding leases requires SPM.
>
> We need to answer these questions:
>
> - why host 1 does not see storage? we have seen this many time in the past
> - why engine try to stop the spm on a host 0 that has no issue?
>   (maybe related to incorrect handling of "no such lease" error?)
> - why the previous tests did not detect that host 1 was not connected to
> storage?
> - why the previous test did not wait until both hosts are connected to
> storage?
>
> We can try to increase the timeout of the add lease test, so we can
> recover if the SPM is lost
> for some reason during this test, but this only hides the real storage
> issues.
>
> Since this is not reproducible, and we don't understand the storage
> failures, we should run OST
> again after failures, and keep failing logs so we can investigate them
> later.
>
> Strangely, we don't experience this issue on master - I did 20-30 OST runs
> in the last 2 weeks,
> and the only issue I have seen is random host_devices failure. Why the
> change queue has
> move failures? What is the difference between OST master runs and the
> change queue?
>
> Nir
>
>
>>
>> Thanks,
>> Dafna
>>
>>
>> On Tue, Mar 5, 2019 at 2:43 PM Nir Soffer <[email protected]> wrote:
>>
>>>
>>>
>>> On Tue, Mar 5, 2019, 13:27 Eyal Shenitzky <[email protected] wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 5, 2019 at 12:58 PM Dafna Ron <[email protected]> wrote:
>>>>
>>>>> Tal,
>>>>>
>>>>> I see the bug is in post but the patch was not merged yet:
>>>>> https://gerrit.ovirt.org/#/c/98191/
>>>>>
>>>>> can you tell me when will we merge the patch? as I rather not add
>>>>> SkipTest if this will be merged soon,
>>>>>
>>>>> thanks,
>>>>> Dafna
>>>>>
>>>>>
>>>>> On Mon, Mar 4, 2019 at 10:42 AM Dafna Ron <[email protected]> wrote:
>>>>>
>>>>>> As I had another failure of this today I will be disabling this test
>>>>>> until issue is resolved (https://bugzilla.redhat.com/1684267)
>>>>>>
>>>>>> Thanks,
>>>>>> Dafna
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 28, 2019 at 8:48 PM Nir Soffer <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Feb 28, 2019 at 11:52 AM Dafna Ron <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We have a failure on the project in basic suite, master branch. The
>>>>>>>> recent failure was in patch:
>>>>>>>> https://gerrit.ovirt.org/#/c/98087/1 - Add pre-seed for
>>>>>>>> ovirt-web-ui
>>>>>>>>
>>>>>>>> CQ is pointing at the below as the root cause (which was merged a
>>>>>>>> while back):
>>>>>>>> https://gerrit.ovirt.org/#/c/97491/ - Add pre-seed for ovirt-web-ui
>>>>>>>>
>>>>>>>> Can you please check the issue as it seems both patches are
>>>>>>>> changing the same thing and the project seem to be broken since
>>>>>>>> https://gerrit.ovirt.org/#/c/97491/3
>>>>>>>>
>>>>>>>> Latest failure:
>>>>>>>>
>>>>>>>> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/13171/
>>>>>>>>
>>>>>>>> Logs:
>>>>>>>>
>>>>>>>> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/13171/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap.py/
>>>>>>>>
>>>>>>>> errors from logs:
>>>>>>>> Engine:
>>>>>>>>
>>>>>>>> 2019-02-27 13:37:28,479-05 ERROR
>>>>>>>> [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-1) [74283e25]
>>>>>>>> Transaction rolled-back for command
>>>>>>>> 'org.ovirt.engine.core.bll.UpdateVmCommand'.
>>>>>>>> 2019-02-27 13:37:28,483-05 ERROR
>>>>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>>>>> (default task-1) [74283e25] EVENT_ID: USER_FAILED_UPDATE_VM(58), 
>>>>>>>> Failed to
>>>>>>>> update VM vm2 (User: admin@inter
>>>>>>>> nal-authz).
>>>>>>>> 2019-02-27 13:37:28,485-05 INFO
>>>>>>>> [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-1) [74283e25]
>>>>>>>> Lock freed to object 'EngineLock:{exclusiveLocks='[vm2=VM_NAME]',
>>>>>>>> sharedLocks='[3500eb82-e5e2-4e24-b41c-ea
>>>>>>>> 02d9f6adee=VM]'}'
>>>>>>>> 2019-02-27 13:37:28,485-05 DEBUG
>>>>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
>>>>>>>> (default task-1) [74283e25] method: runAction, params: [UpdateVm,
>>>>>>>> VmManagementParametersBase:{commandId='340
>>>>>>>> 59769-05b9-429e-8356-f6b9b9953f55', user='admin',
>>>>>>>> commandType='UpdateVm', vmId='3500eb82-e5e2-4e24-b41c-ea02d9f6adee'}],
>>>>>>>> timeElapsed: 6618ms
>>>>>>>> 2019-02-27 13:37:28,486-05 ERROR
>>>>>>>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] 
>>>>>>>> (default
>>>>>>>> task-1) [] Operation Failed: []
>>>>>>>> 2019-02-27 13:37:28,579-05 DEBUG
>>>>>>>> [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
>>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-85) [] method: get, 
>>>>>>>> params:
>>>>>>>> [e29c0ba1-464c-4eb4-a8f2-c6933d9
>>>>>>>> 9969a], timeElapsed: 3ms
>>>>>>>>
>>>>>>>>
>>>>>>>> vdsm:
>>>>>>>>
>>>>>>>> 2019-02-27 13:37:21,987-0500 INFO  (jsonrpc/1) [vdsm.api] FINISH
>>>>>>>> lease_info error=No such lease 3500eb82-e5e2-4e24-b41c-ea02d9f6adee
>>>>>>>> from=::ffff:192.168.201.4,43920,
>>>>>>>> flow_id=117dec74-ad59-4b12-8148-b2c130337c10,
>>>>>>>>  task_id=9c297d41-0aa7-4c74-b268-b710e666bc6c (api:52)
>>>>>>>> 2019-02-27 13:37:21,988-0500 ERROR (jsonrpc/1)
>>>>>>>> [storage.TaskManager.Task] 
>>>>>>>> (Task='9c297d41-0aa7-4c74-b268-b710e666bc6c')
>>>>>>>> Unexpected error (task:875)
>>>>>>>> Traceback (most recent call last):
>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py",
>>>>>>>> line 882, in _run
>>>>>>>>     return fn(*args, **kargs)
>>>>>>>>   File "<string>", line 2, in lease_info
>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
>>>>>>>> 50, in method
>>>>>>>>     ret = func(*args, **kwargs)
>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
>>>>>>>> 3702, in lease_info
>>>>>>>>     info = dom.lease_info(lease.lease_id)
>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line
>>>>>>>> 674, in lease_info
>>>>>>>>     return vol.lookup(lease_id)
>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py",
>>>>>>>> line 553, in lookup
>>>>>>>>     raise NoSuchLease(lease_id)
>>>>>>>> NoSuchLease: No such lease 3500eb82-e5e2-4e24-b41c-ea02d9f6adee
>>>>>>>>
>>>>>>>
>>>>>>> This is not an error of vdsm. Someone asked for information about a
>>>>>>> non-existing lease,
>>>>>>> and vdsm fail the request with the expected error:
>>>>>>>
>>>>>>>     NoSuchLease: No such lease 3500eb82-e5e2-4e24-b41c-ea02d9f6adee
>>>>>>>
>>>>>>> But since the error is not a public error, vdsm logger handle it as
>>>>>>> unexpected error and
>>>>>>> create lot of noise in vdsm log.
>>>>>>>
>>>>>>> On engine side, this creates even worse noise, as we can see here:
>>>>>>>
>>>>>>> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/13171/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
>>>>>>> (look for 2019-02-27 13:37:21,996-05)
>>>>>>>
>>>>>>> All these noise could be avoided if vdsm was using a proper public
>>>>>>> error.
>>>>>>> I filed  https://bugzilla.redhat.com/1684267
>>>>>>>
>>>>>>> But, this error is not related to the actual failing test, this is
>>>>>>> just noise hiding the real problem.
>>>>>>>
>>>>>>>
>>>> Hi,
>>>>
>>>> As Nir already said, This log is just noise, this is not the real
>>>> problem.
>>>>
>>>> I think that we already talk about this issue but I can't find the
>>>> thread.
>>>>
>>>> This lease failure in the log is expected and this is not an error
>>>> please ignore this in the next failure analysis.
>>>>
>>>> From the log I can see the following error:
>>>>
>>>> {"jsonrpc": "2.0", "id": "6a42a9cf-e220-4916-ba0e-db2182978a5d", "error": 
>>>> {"message": "Cannot find master domain: 
>>>> u'spUUID=717575a9-7818-460d-ba3a-d5bdd8ef9ed3, 
>>>> msdUUID=34311cc1-c4d2-4cfe-88b5-dd5ad72261d3'", "code": 304}}�
>>>> 2019-02-27 13:35:14,175-05 DEBUG 
>>>> [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) 
>>>> [] Message received: {"jsonrpc": "2.0", "id": 
>>>> "6a42a9cf-e220-4916-ba0e-db2182978a5d", "error": {"message": "Cannot find 
>>>> master domain: u'spUUID=717575a9-7818-460d-ba3a-d5bdd8ef9ed3, 
>>>> msdUUID=34311cc1-c4d2-4cfe-88b5-dd5ad72261d3'", "code": 304}}
>>>> 2019-02-27 13:35:14,182-05 ERROR 
>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
>>>> (EE-ManagedThreadFactory-engine-Thread-23) [73da8d8b] EVENT_ID: 
>>>> VDS_BROKER_COMMAND_FAILURE(10,802), VDSM lago-basic-suite-master-host-1 
>>>> command ConnectStoragePoolVDS failed: Cannot find master domain: 
>>>> u'spUUID=717575a9-7818-460d-ba3a-d5bdd8ef9ed3, 
>>>> msdUUID=34311cc1-c4d2-4cfe-88b5-dd5ad72261d3'
>>>> 2019-02-27 13:35:14,182-05 INFO  
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] 
>>>> (EE-ManagedThreadFactory-engine-Thread-23) [73da8d8b] Command 
>>>> 'org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand' 
>>>> return value 'StatusOnlyReturn [status=Status [code=304, message=Cannot 
>>>> find master domain: u'spUUID=717575a9-7818-460d-ba3a-d5bdd8ef9ed3, 
>>>> msdUUID=34311cc1-c4d2-4cfe-88b5-dd5ad72261d3']]'
>>>> 2019-02-27 13:35:14,182-05 INFO  
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] 
>>>> (EE-ManagedThreadFactory-engine-Thread-23) [73da8d8b] HostName = 
>>>> lago-basic-suite-master-host-1
>>>>
>>>> This failure occured before the lease info was requested (and failed as
>>>> expected).
>>>>
>>>> Therefore the fix for -  https://bugzilla.redhat.com/1684267 will not
>>>> solve anything that related to the failure.
>>>>
>>>
>>> I agree with Eyal.
>>>
>>> Please find the owner of the failing test. Every test should have an
>>> owner. If there is no owner the test should be deleted.
>>>
>>>
>>>> T hanks for reporting this.
>>>>>>>
>>>>>>> Nir
>>>>>>>
>>>>>> _______________________________________________
>>>>> Devel mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/PLQMYYARPAYW7KWSCHC4DWOMU4U63BGN/
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Eyal Shenitzky
>>>>
>>> _______________________________________________
> Infra mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/KMIRPDW5NX27G2BK47JOCTRH5YTUMVUW/
>


-- 
*Liora Milbaum*
Senior Principal Software Engineer
RHV/CNV DevOps
EMEA VIRTUALIZATION R&D

T: +972-54-6560051
<https://red.ht/sig>
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/ZIWL6NIKM2QGZFLWMB4DGWY6VC6XPHNV/

Reply via email to