[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38031#comment-38031
 ] 

Barak Korren commented on OVIRT-2498:
-------------------------------------

[~pkotas] I think the best place to discuss Kubevirt issues is on the 
Kubevirt-related mailing lists where other Kubvirt developers can see the 
discussion.

To your questions:

{quote}
I am working on fixing the issues on the KubeVirt e2e test suites. This
task is directly related to unstable CI, due to unknown errors.
The progress is reported in the CNV trello:
https://trello.com/c/HNXcMEQu/161-epic-improve-ci

I am creating this issue since the KubeVirt experience random timeouts on
random tests most of the times when test suites run.
The issue from outside is showing as timeouts on difference part of tests.
Sometimes the tests fails in set up phase, again due to random timeout.
The example in the link bellow timed out for network connection on
localhost.

[check-patch.k8s-1.11.0-dev.el7.x86_64]
requests.exceptions.ReadTimeout:
UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
(read timeout=60)
{quote}

Its generally a bad idea to rely to much on timeout in a test suit like this. 
We've seen such issues over and over again in OST as well. Its probably best to 
remove all such timeout definitions and just have overall timeout set for the 
entire test suit.

{quote}
Example of failing test suites is here
https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/consoleText

The list of errors related to the failing CI can be found in my notes
https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJFTjvjU/edit#heading=h.vcfoo8hi48ul

I am not sure whether KubeVirt already shared the resource requirements, so
I provide short summary:
Resources for KubeVirt e2e tests:

    at least 12GB of RAM - we start 3 nodes (3 docker images) each require
    4GB of RAM
    exposed /dev/kvm to enable native virtualization
    cached images, since these are used to build the test cluster:
    kubevirtci/os-3.10.0-crio:latest
    kubevirtci/os-3.10.0-multus:latest
    kubevirtci/os-3.10.0:latest
    kubevirtci/k8s-1.10.4:latest
    kubevirtci/k8s-multus-1.11.1:latest
    kubevirtci/k8s-1.11.0:latest

How can we overcome this? Can we work together to build a suitable
requirements for running the tests so it passes each time?
{quote}

To my knowledge the existing setup meets all the requirements you specify above.

We have 3 physical hosts that are used to run Kubevirt tests, each host has 
128GB of ram and runs 7 containers where each container runs its own Libvirt, 
Docker and Systemd so that it looks like its own host to the tests running 
inside.  The amount of containers per host was calculated to have each 
container have a little over 16GB of RAM for itself. So we should have more 
then enough for Kubevirt. Also, in our measurements Kubevirt's CI tests took 
way less then 12GB, ansd were somewhere around 8GB.

All the images that start with 'kubevirtci' are cached by the system.

WRT /dev/kvm - we do have it exposed in the containers we run, but I think that 
is irrelevant since AFAIK Kubevirt-CI  runs qemu on its own inside its own 
container, so the /dev/kvm device files simply needs to exist inside that 
container.



> Failing KubeVirt CI
> -------------------
>
>                 Key: OVIRT-2498
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>            Reporter: Petr Kotas
>            Assignee: infra
>
> Hi,
> I am working on fixing the issues on the KubeVirt e2e test suites. This
> task is directly related to unstable CI, due to unknown errors.
> The progress is reported in the CNV trello:
> https://trello.com/c/HNXcMEQu/161-epic-improve-ci
> I am creating this issue since the KubeVirt experience random timeouts on
> random tests most of the times when test suites run.
> The issue from outside is showing as timeouts on difference part of tests.
> Sometimes the tests fails in set up phase, again due to random timeout.
> The example in the link bellow timed out for network connection on
> localhost.
> [check-patch.k8s-1.11.0-dev.el7.x86_64]
> requests.exceptions.ReadTimeout:
> UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
> (read timeout=60)
> Example of failing test suites is here
> https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/consoleText
> The list of errors related to the failing CI can be found in my notes
> https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJFTjvjU/edit#heading=h.vcfoo8hi48ul
> I am not sure whether KubeVirt already shared the resource requirements, so
> I provide short summary:
> *Resources for KubeVirt e2e tests:*
>    - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
>    4GB of RAM
>    - exposed /dev/kvm to enable native virtualization
>    - cached images, since these are used to build the test cluster:
>       - kubevirtci/os-3.10.0-crio:latest
>       - kubevirtci/os-3.10.0-multus:latest
>       - kubevirtci/os-3.10.0:latest
>       - kubevirtci/k8s-1.10.4:latest
>       - kubevirtci/k8s-multus-1.11.1:latest
>       - kubevirtci/k8s-1.11.0:latest
> How can we overcome this? Can we work together to build a suitable
> requirements for running the tests so it passes each time?
> Kind regards,
> Petr Kotas



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/TBMNT7PWYIPLZRIY3F6ESVIB4M465DVB/

Reply via email to