On Mon, Jul 2, 2018 at 9:32 AM Yedidyah Bar David <[email protected]> wrote:

> Hi all,
>
> I noticed that our hosted-engine suites [1] often fail recently, and
> decided to have a look at [2], which are on 4.2, which should
> hopefully be "rock solid" and basically never fail.
>
> I looked at these, [3][4][5][6][7], which are all the ones that still
> appear in [2] and marked as failed.
>
> Among them:
>
> - All but one failed while "Waiting for agent to be ready" and timing
> out after 10 minutes, as part of 008_restart_he_vm.py, which was added
> a month ago [8] and then patched [9].
>
> - The other one [7] failed while "Waiting for engine to migrate", also
> eventually timing out after 10 minutes, as part of
> 010_local_mainentance.py, which was also added in [9].
>
> I also had a look at the last ones that succeeded, builds 329 to 337
> of [2]. There:
>
> - "Waiting for agent to be ready" took between 26 and 48 seconds
>
> - "Waiting for engine to migrate" took between 69 and 82 seconds
>
> Assuming these numbers are reasonable (which might be debatable), 10
> minutes indeed sounds like a reasonable timeout, and I think we should
> handle each failure specifically. Did anyone check them? Was it an
> infra issue/load/etc.? A bug? Something else?
>

Suites should be monitored by their respectful maintainers, the infra team
doesn't have the capasity nor resources
to monitor any new suite that is running in CI.

Having said that, if a certain infra issue is reported, either its Lago,
OST or infra issue, we'll of course do our best to find and fix the issue.


>
> I didn't check the logs yet, might do this later. Also didn't check
> the failures in other jobs in [1].
>
> Best regards,
>
> [1] https://jenkins.ovirt.org/search/?q=he-basic
>
> [2]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/
>
> [3]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/310/consoleFull
>
> [4]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/320/consoleFull
>
> [5]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/321/consoleFull
>
> [6]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/328/consoleFull
>
> [7]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-ansible-suite-4.2/336/
>
> [8] https://gerrit.ovirt.org/91952
>
> [9] https://gerrit.ovirt.org/92341
> --
> Didi
>


-- 

Eyal edri


MANAGER

RHV DevOps

EMEA VIRTUALIZATION R&D


Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/WNJVYVQL25CRJXVH44VSBWP2R6VBWFVC/

Reply via email to