On 4/6/21 9:55 AM, Yedidyah Bar David wrote:
On Tue, Apr 6, 2021 at 9:24 AM Marcin Sobczyk <[email protected]> wrote:
Hi,

On 4/6/21 7:23 AM, Yedidyah Bar David wrote:
On Mon, Apr 5, 2021 at 5:53 AM <[email protected]> wrote:
Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
Build: 
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1974/
FYI: This failed twice in a row (1973 and 1974), for the same reason.
I reproduced locally, looked a bit, failed to find the root cause.
When I connected
to host-1's console, it was stuck in emergency after reboot. I checked
a bit, there
was some error about kdump failing to read the kernel image
( /boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64 ), when I tried manually
as root I did
manage to read it. I rebooted, and the VM came up fine. I decided to
try OST again,
cleaned up and ran it, and opened a 'lago console' on the vm after it
was up, but
OST passed. Tried again, passed again. Then I manually ran in CI 1975
and it passed,
and also the nightly 1976 passed. So I am going to ignore for now.

I think we need a patch to make lago/OST log consoles of all the VMs.
I might try
to work on this.
Also stumbled upon this. Please take a look at
https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/114050/
Yes, I did notice this change and wondered if it's related...

But it's not merged yet, and still HE passed at least 4 times (two locally,
two on CI). Obviously this does not prove that the issue is fixed.

Anyway, in addition to merely fixing it (which perhaps your patch does),
I also wanted to emphasize the importance of making it easier to fix
future such cases. How did you manage to find the root cause?
My case was similar - HE suite was failing for me constantly. I noticed
host-1 drops to emergency shell, so I just 'virsh console'd inside
and went through the logs. That's when I spotted the problem with
the additional '/var/tmp' disk. I tried the fix on my machine and HE
suite started working again. Moments later I tried running HE suite
without the patch and it was successful again.

I couldn't figure out what's the real cause behind these problems,
but removing the unnecessary additional disk from host-1 seemed
to do the trick.

+1 for logging consoles of the VMs - that should help with these kind
of problems in the future.

Regards, Marcin


Best regards,
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/3H2HXEGUTWYV23EL7QT6NJETCLHN6MWG/

Reply via email to