[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38036#comment-38036
 ] 

Petr Kotas commented on OVIRT-2498:
-----------------------------------

Hi Barak,

I understand and agree that mailing list is a great place to share the
knowledge. I will write a summary there once we come up with a solution for
this issue.
And we need to figure the solution swiftly as TLV has holidays approaching.

The issues I have described are recurring for almost a 3 months and are
blocking us to progress with our work.
We already work on fixing the issue from our site and are working on
additional fixes to provide even more stable tests.

The other part is be sure, we are not crashing the CI.
Would you be able to give me a monitoring access so I can see whether there
are any race conditions, or we deplete some resources?

Regarding the timeouts. We are not relying on them. The timeout you have
seen in the logs, is from your infrastructure.
It signals, there is networking issue and the docker cannot connect to
localhost, which is weird.

So please, can I have monitoring access? And can you please check whether
the network does not have any issues?

Thank you for your help! I appreciate it.

Best,
Petr



On Sun, Sep 16, 2018 at 8:08 AM Barak Korren (oVirt JIRA) <



> Failing KubeVirt CI
> -------------------
>
>                 Key: OVIRT-2498
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>            Reporter: Petr Kotas
>            Assignee: infra
>
> Hi,
> I am working on fixing the issues on the KubeVirt e2e test suites. This
> task is directly related to unstable CI, due to unknown errors.
> The progress is reported in the CNV trello:
> https://trello.com/c/HNXcMEQu/161-epic-improve-ci
> I am creating this issue since the KubeVirt experience random timeouts on
> random tests most of the times when test suites run.
> The issue from outside is showing as timeouts on difference part of tests.
> Sometimes the tests fails in set up phase, again due to random timeout.
> The example in the link bellow timed out for network connection on
> localhost.
> [check-patch.k8s-1.11.0-dev.el7.x86_64]
> requests.exceptions.ReadTimeout:
> UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
> (read timeout=60)
> Example of failing test suites is here
> https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/consoleText
> The list of errors related to the failing CI can be found in my notes
> https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJFTjvjU/edit#heading=h.vcfoo8hi48ul
> I am not sure whether KubeVirt already shared the resource requirements, so
> I provide short summary:
> *Resources for KubeVirt e2e tests:*
>    - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
>    4GB of RAM
>    - exposed /dev/kvm to enable native virtualization
>    - cached images, since these are used to build the test cluster:
>       - kubevirtci/os-3.10.0-crio:latest
>       - kubevirtci/os-3.10.0-multus:latest
>       - kubevirtci/os-3.10.0:latest
>       - kubevirtci/k8s-1.10.4:latest
>       - kubevirtci/k8s-multus-1.11.1:latest
>       - kubevirtci/k8s-1.11.0:latest
> How can we overcome this? Can we work together to build a suitable
> requirements for running the tests so it passes each time?
> Kind regards,
> Petr Kotas



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/UZH7Y5IZKCGF4S6CJ3LPJ5UJDFMSOGBI/

Reply via email to