On Mon, Jul 16, 2018 at 2:19 PM, Steven Rosenberg <[email protected]>
wrote:

> Hi,
>
> I reran the test on: https://gerrit.ovirt.org/#/c/92734/
>
> 3 tests passed and two failed on:
>
> 004_basic_sanity.vm_run
>
> The error is:
>
> detail: Cannot run VM. There is no host that satisfies current scheduling 
> constraints. See below for details:, The host 
> lago-he-basic-ansible-suite-master-host-0 did not satisfy internal filter 
> Memory because its available memory is too low (656 MB) to run the VM.
>
>
> It looks like there is not enough memory on the test environment and that
> may be causing intermittent errors on the other tests as well.
>
> Is it possible to increase the memory for the CI environment so that we
> can ensure the errors are not due to lack of resources?
>

We already increased the RAM in https://gerrit.ovirt.org/#/c/92902/.
Can you please rebase your patch?

>
> With Best Regards.
>
> Steven Rosenberg.
>
>
> On Mon, Jul 16, 2018 at 12:03 PM Andrej Krejcir <[email protected]>
> wrote:
>
>> Hi,
>>
>> The jenkins job is no longer available, I will run the test manually and
>> see why it fails.
>>
>> Maybe this patch will fix the problem with the test:
>> https://gerrit.ovirt.org/#/c/92734/
>> It ensures that all services are stopped before starting them again.
>>
>> The purpose of the test is to try restarting the HE VM.
>> When the HE VM is run for the first time, the engine generates an OVF
>> file describing the HE VM. Then when the VM is started again, the OVF is
>> used. We wanted to test that the generated OVF is correct and the VM can be
>> started.  The test intended to cleanly shutdown and restart the vdsm.
>>
>>
>> Andrej
>>
>>
>> On Fri, 6 Jul 2018 at 11:38, Sandro Bonazzola <[email protected]>
>> wrote:
>>
>>> Failing job is: https://jenkins.ovirt.org/job/ovirt-system-tests_he-
>>> basic-iscsi-suite-master/308
>>> My findings:
>>>
>>> 2018-07-06 
>>> 03:28:05,461::008_restart_he_vm.py::_shutdown_he_vm::123::root::INFO::    * 
>>> VM is down.
>>> 2018-07-06 
>>> 03:28:05,461::008_restart_he_vm.py::_restart_services::127::root::INFO::    
>>> * Restarting services...
>>>
>>> 2018-07-06 03:28:05,729::log_utils.py::__exit__::611::lago.ssh::DEBUG::end 
>>> task:f7db7960-b541-4d18-9a73-45b3d0677f03:Get ssh client for 
>>> lago-he-basic-iscsi-suite-master-host-0:
>>> 2018-07-06 03:28:06,031::ssh.py::ssh::58::lago.ssh::DEBUG::Running 98d0de9e 
>>> on lago-he-basic-iscsi-suite-master-host-0: systemctl restart vdsmd 
>>> ovirt-ha-broker ovirt-ha-agent
>>> 2018-07-06 03:28:22,887::ssh.py::ssh::81::lago.ssh::DEBUG::Command 98d0de9e 
>>> on lago-he-basic-iscsi-suite-master-host-0 returned with 0
>>>
>>>
>>> it then waits for engine to be up again till it gives up 10 minutes
>>> later:
>>>
>>> 2018-07-06 
>>> 03:38:20,979::log_utils.py::__enter__::600::lago.ssh::DEBUG::start 
>>> task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for 
>>> lago-he-basic-iscsi-suite-master-host-0:
>>> 2018-07-06 03:38:21,290::log_utils.py::__exit__::611::lago.ssh::DEBUG::end 
>>> task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for 
>>> lago-he-basic-iscsi-suite-master-host-0:
>>> 2018-07-06 03:38:21,591::ssh.py::ssh::58::lago.ssh::DEBUG::Running 07b7f08a 
>>> on lago-he-basic-iscsi-suite-master-host-0: hosted-engine --vm-status
>>> 2018-07-06 03:38:21,637::ssh.py::ssh::81::lago.ssh::DEBUG::Command 07b7f08a 
>>> on lago-he-basic-iscsi-suite-master-host-0 returned with 1
>>> 2018-07-06 03:38:21,637::ssh.py::ssh::89::lago.ssh::DEBUG::Command 07b7f08a 
>>> on lago-he-basic-iscsi-suite-master-host-0 output:
>>>  The hosted engine configuration has not been retrieved from shared 
>>> storage. Please ensure that ovirt-ha-agent is running and the storage 
>>> server is reachable.
>>>
>>>
>>> On host 0, vdsm.log shows it's restarting:
>>>
>>> 2018-07-05 23:28:19,646-0400 INFO  (MainThread) [vds] Exiting (vdsmd:171)
>>> 2018-07-05 23:28:23,514-0400 INFO  (MainThread) [vds] (PID: 19942) I am the 
>>> actual vdsm 4.30.0-465.git1ad18aa.el7 
>>> lago-he-basic-iscsi-suite-master-host-0 (3.10.0-862.2.3.el7.x86_64) 
>>> (vdsmd:149)
>>>
>>> vdsm then stays for the 10 minutes above mentioned waiting for the
>>> storage pool to go up:
>>>
>>> 2018-07-05 23:38:20,739-0400 INFO  (vmrecovery) [vds] recovery: waiting for 
>>> storage pool to go up (clientIF:704)
>>>
>>> while the ha agent try to get the hosted engine stats
>>>
>>> 2018-07-05 23:38:30,363-0400 WARN  (vdsm.Scheduler) [Executor] Worker 
>>> blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params': {}, 
>>> 'jsonrpc': '2.0', 'method': u'Host.getStats', 'id': 
>>> u'cddde340-37a8-4f72-a471-b5bc40c06a16'} at 0x7f262814ae90> timeout=60, 
>>> duration=600.00 at 0x7f262815c050> task#=0 at 0x7f262834b810>, traceback:
>>> File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
>>>   self.__bootstrap_inner()
>>> File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
>>>   self.run()
>>> File: "/usr/lib64/python2.7/threading.py", line 765, in run
>>>   self.__target(*self.__args, **self.__kwargs)
>>> File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 
>>> 195, in run
>>>   ret = func(*args, **kwargs)
>>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
>>>   self._execute_task()
>>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
>>> _execute_task
>>>   task()
>>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in 
>>> __call__
>>>   self._callable()
>>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 261, 
>>> in __call__
>>>   self._handler(self._ctx, self._req)
>>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 304, 
>>> in _serveRequest
>>>   response = self._handle_request(req, ctx)
>>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 344, 
>>> in _handle_request
>>>   res = method(**params)
>>> File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in 
>>> _dynamicMethod
>>>   result = fn(*methodArgs)
>>> File: "<string>", line 2, in getStats
>>> File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in 
>>> method
>>>   ret = func(*args, **kwargs)
>>> File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1409, in getStats
>>>   multipath=True)}
>>> File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 78, in 
>>> get_stats
>>>   ret['haStats'] = _getHaInfo()
>>> File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 176, in 
>>> _getHaInfo
>>>   stats = instance.get_all_stats()
>>> File: 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
>>> line 94, in get_all_stats
>>>   stats = broker.get_stats_from_storage()
>>> File: 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>  line 135, in get_stats_from_storage
>>>   result = self._proxy.get_stats()
>>> File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
>>>   return self.__send(self.__name, args)
>>> File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
>>>   verbose=self.__verbose
>>> File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
>>>   return self.single_request(host, handler, request_body, verbose)
>>> File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
>>>   response = h.getresponse(buffering=True)
>>> File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
>>>   response.begin()
>>> File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
>>>   version, status, reason = self._read_status()
>>> File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
>>>   line = self.fp.readline(_MAXLINE + 1)
>>> File: "/usr/lib64/python2.7/socket.py", line 476, in readline
>>>   data = self._sock.recv(self._rbufsize) (executor:363)
>>>
>>>
>>> Andrej, I see the test has been added by you in commit
>>> e8d32f7375f2033b73544f47c1e1ca67abe8d35a
>>> I'm not sure about the purpose of this test but I don't understand why
>>> we are restarting the services on the host.
>>>
>>> Nir, Tal, any idea on why the storage pool is not getting up?
>>>
>>> I see vdsm is in recovery mode, I'm not sure if this was what the test was 
>>> supposed to do or if the intention was to cleanly shutdown vdsm and cleanly 
>>> restart it.
>>>
>>>
>>>
>>>
>>> --
>>>
>>> SANDRO BONAZZOLA
>>>
>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>>
>>> Red Hat EMEA <https://www.redhat.com/>
>>>
>>> [email protected]
>>> <https://red.ht/sig>
>>>
>>
> _______________________________________________
> Devel mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-
> guidelines/
> List Archives: https://lists.ovirt.org/archives/list/[email protected]/
> message/QONCFDY7AEZH2SV2ZCWFGGYIBZIBFJWP/
>
>


-- 
*GAL bEN HAIM*
RHV DEVOPS
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/NAUCG6HINTYZ5RBKVAEQDCANYGI3WXEL/

Reply via email to