On Fri, Jun 18, 2021 at 10:18 AM Marcin Sobczyk <msobc...@redhat.com> wrote: > > > > On 6/17/21 6:59 PM, Yedidyah Bar David wrote: > > On Thu, Jun 17, 2021 at 6:27 PM Marcin Sobczyk <msobc...@redhat.com> wrote: > >> > >> > >> On 6/17/21 1:44 PM, Yedidyah Bar David wrote: > >>> On Wed, Jun 16, 2021 at 1:23 PM Yedidyah Bar David <d...@redhat.com> > >>> wrote: > >>>> Hi, > >>>> > >>>> I now tried running locally hc-basic-suite-master with a patched OST, > >>>> and it failed due to $subject. I checked and see that this also > >>>> happened on CI, e.g. [1], before it started failing to to an unrelated > >>>> reason later: > >>>> > >>>> E TASK [gluster.infra/roles/firewall_config : Add/Delete > >>>> services to firewalld rules] *** > >>>> E failed: [lago-hc-basic-suite-master-host-0] > >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false, > >>>> "item": "glusterfs", "msg": "ERROR: Exception caught: > >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs' > >>>> not among existing services Permanent and Non-Permanent(immediate) > >>>> operation, Services are defined by port/tcp relationship and named as > >>>> they are in /etc/services (on most systems)"} > >>>> E failed: [lago-hc-basic-suite-master-host-2] > >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false, > >>>> "item": "glusterfs", "msg": "ERROR: Exception caught: > >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs' > >>>> not among existing services Permanent and Non-Permanent(immediate) > >>>> operation, Services are defined by port/tcp relationship and named as > >>>> they are in /etc/services (on most systems)"} > >>>> E failed: [lago-hc-basic-suite-master-host-1] > >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false, > >>>> "item": "glusterfs", "msg": "ERROR: Exception caught: > >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs' > >>>> not among existing services Permanent and Non-Permanent(immediate) > >>>> operation, Services are defined by port/tcp relationship and named as > >>>> they are in /etc/services (on most systems)"} > >>>> > >>>> This seems similar to [2], and indeed I can't see the package > >>>> 'glusterfs-server' installed locally on host-0. Any idea? > >>> I think I understand: > >>> > >>> It seems like the deployment of hc relied on the order of running the > >>> deploy > >>> scripts as written in lagoinitfile. With the new deploy code, all of them > >>> run > >>> in parallel. Does this make sense? > >> The scripts run in parallel as in "on all VMs at the same time", but > >> sequentially > >> as in "one script at a time on each VM" - this is the same behavior we > >> had with lago deployment. > > Well, I do not think it works as intended, then. When running locally, > > I logged into host-0, and after it failed, I had: > > > > # dnf history > > ID | Command line > > > > | Date and time | Action(s) | Altered > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 4 | install -y --nogpgcheck ansible gluster-ansible-roles > > ovirt-hosted-engine-setup ovirt-ansible-hosted-engine-setup > > ovirt-ansible-reposit | 2021-06-17 11:54 | I, U | 8 > > 3 | -y --nogpgcheck install ovirt-host python3-coverage > > vdsm-hook-vhostmd > > | 2021-06-08 02:15 | Install | 493 EE > > 2 | install -y dnf-utils > > https://resources.ovirt.org/pub/yum-repo/ovirt-release-master.rpm > > | 2021-06-08 02:14 | > > Install | 1 > > 1 | > > > > | 2021-06-08 02:06 | Install | 511 EE > > > > Meaning, it already ran setup_first_host.sh (and failed there), but > > didn't run hc_setup_host.sh, although it appears before it. > > > > If you check [1], which is a build that failed due to this reason > > (unlike the later ones), you see there: > > > > ------------------------------ Captured log setup > > ------------------------------ > > 2021-06-07 01:58:38+0000,594 INFO > > [ost_utils.pytest.fixtures.deployment] Waiting for SSH on the VMs > > (deployment:40) > > 2021-06-07 01:59:11+0000,947 INFO > > [ost_utils.deployment_utils.package_mgmt] oVirt packages used on VMs: > > (package_mgmt:133) > > 2021-06-07 01:59:11+0000,948 INFO > > [ost_utils.deployment_utils.package_mgmt] > > vdsm-4.40.70.2-1.git34cdc8884.el8.x86_64 (package_mgmt:135) > > 2021-06-07 01:59:11+0000,950 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh > > on lago-hc-basic-suite-master-host-1 (scripts:36) > > 2021-06-07 01:59:11+0000,950 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh > > on lago-hc-basic-suite-master-host-2 (scripts:36) > > 2021-06-07 01:59:11+0000,952 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh > > on lago-hc-basic-suite-master-host-0 (scripts:36) > > 2021-06-07 01:59:13+0000,260 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh > > on lago-hc-basic-suite-master-host-1 (scripts:36) > > 2021-06-07 01:59:13+0000,370 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh > > on lago-hc-basic-suite-master-host-0 (scripts:36) > > 2021-06-07 01:59:13+0000,526 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh > > on lago-hc-basic-suite-master-host-2 (scripts:36) > > 2021-06-07 01:59:15+0000,250 INFO > > [ost_utils.deployment_utils.scripts] Running > > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/setup_first_host.sh > > on lago-hc-basic-suite-master-host-0 (scripts:36) > > > > So you see that hc_setup_host.sh was at least logged as being started > > _after_ setup_host.sh, but very _close_ to it - I can't believe it > > finished in 2 seconds. This part of the log is the same also for later > > runs, although they fail earlier. You can compare this with the log of > > the last successful run (using lago deploy), which also does not very > > clearly show when each script finished, but at least logs their start > > in the correct order. > > > > That said, I do not think the solution should be to now spend time on > > investigating this, finding the root cause, and fixing - I think we > > should instead stop keeping the list of deploy scripts in lagoinitfile > > but move them simply to python code. > [1] is unfortunately already gone - please ping me when you notice this > kind of behavior again.
OK, it seems to be unrelated. Should be fixed with something like: https://gerrit.ovirt.org/c/ovirt-system-tests/+/115318 hc is Not ready yet, though - no urgency in merging this. > > Regards, Marcin > > > Best regards, > > > >> Regards, Marcin > >> > >>>> Thanks and best regards, > >>>> > >>>> [1] > >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-master/2088/ > >>>> > >>>> [2] https://github.com/oVirt/ovirt-ansible/issues/124 > >>>> -- > >>>> Didi > >>> > >>> -- > >>> Didi > >>> > > > -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/OWODY6Q3QBZHPVBIUZTZ4DMNRGPMIEEA/