On Fri, Jun 18, 2021 at 10:18 AM Marcin Sobczyk <msobc...@redhat.com> wrote:
>
>
>
> On 6/17/21 6:59 PM, Yedidyah Bar David wrote:
> > On Thu, Jun 17, 2021 at 6:27 PM Marcin Sobczyk <msobc...@redhat.com> wrote:
> >>
> >>
> >> On 6/17/21 1:44 PM, Yedidyah Bar David wrote:
> >>> On Wed, Jun 16, 2021 at 1:23 PM Yedidyah Bar David <d...@redhat.com> 
> >>> wrote:
> >>>> Hi,
> >>>>
> >>>> I now tried running locally hc-basic-suite-master with a patched OST,
> >>>> and it failed due to $subject. I checked and see that this also
> >>>> happened on CI, e.g. [1], before it started failing to to an unrelated
> >>>> reason later:
> >>>>
> >>>> E           TASK [gluster.infra/roles/firewall_config : Add/Delete
> >>>> services to firewalld rules] ***
> >>>> E           failed: [lago-hc-basic-suite-master-host-0]
> >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false,
> >>>> "item": "glusterfs", "msg": "ERROR: Exception caught:
> >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs'
> >>>> not among existing services Permanent and Non-Permanent(immediate)
> >>>> operation, Services are defined by port/tcp relationship and named as
> >>>> they are in /etc/services (on most systems)"}
> >>>> E           failed: [lago-hc-basic-suite-master-host-2]
> >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false,
> >>>> "item": "glusterfs", "msg": "ERROR: Exception caught:
> >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs'
> >>>> not among existing services Permanent and Non-Permanent(immediate)
> >>>> operation, Services are defined by port/tcp relationship and named as
> >>>> they are in /etc/services (on most systems)"}
> >>>> E           failed: [lago-hc-basic-suite-master-host-1]
> >>>> (item=glusterfs) => {"ansible_loop_var": "item", "changed": false,
> >>>> "item": "glusterfs", "msg": "ERROR: Exception caught:
> >>>> org.fedoraproject.FirewallD1.Exception: INVALID_SERVICE: 'glusterfs'
> >>>> not among existing services Permanent and Non-Permanent(immediate)
> >>>> operation, Services are defined by port/tcp relationship and named as
> >>>> they are in /etc/services (on most systems)"}
> >>>>
> >>>> This seems similar to [2], and indeed I can't see the package
> >>>> 'glusterfs-server' installed locally on host-0. Any idea?
> >>> I think I understand:
> >>>
> >>> It seems like the deployment of hc relied on the order of running the 
> >>> deploy
> >>> scripts as written in lagoinitfile. With the new deploy code, all of them 
> >>> run
> >>> in parallel. Does this make sense?
> >> The scripts run in parallel as in "on all VMs at the same time", but
> >> sequentially
> >> as in "one script at a time on each VM" - this is the same behavior we
> >> had with lago deployment.
> > Well, I do not think it works as intended, then. When running locally,
> > I logged into host-0, and after it failed, I had:
> >
> > # dnf history
> > ID     | Command line
> >
> >     | Date and time    | Action(s)      | Altered
> > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >       4 | install -y --nogpgcheck ansible gluster-ansible-roles
> > ovirt-hosted-engine-setup ovirt-ansible-hosted-engine-setup
> > ovirt-ansible-reposit | 2021-06-17 11:54 | I, U           |    8
> >       3 | -y --nogpgcheck install ovirt-host python3-coverage
> > vdsm-hook-vhostmd
> >               | 2021-06-08 02:15 | Install        |  493 EE
> >       2 | install -y dnf-utils
> > https://resources.ovirt.org/pub/yum-repo/ovirt-release-master.rpm
> >                                              | 2021-06-08 02:14 |
> > Install        |    1
> >       1 |
> >
> >     | 2021-06-08 02:06 | Install        |  511 EE
> >
> > Meaning, it already ran setup_first_host.sh (and failed there), but
> > didn't run hc_setup_host.sh, although it appears before it.
> >
> > If you check [1], which is a build that failed due to this reason
> > (unlike the later ones), you see there:
> >
> > ------------------------------ Captured log setup 
> > ------------------------------
> > 2021-06-07 01:58:38+0000,594 INFO
> > [ost_utils.pytest.fixtures.deployment] Waiting for SSH on the VMs
> > (deployment:40)
> > 2021-06-07 01:59:11+0000,947 INFO
> > [ost_utils.deployment_utils.package_mgmt] oVirt packages used on VMs:
> > (package_mgmt:133)
> > 2021-06-07 01:59:11+0000,948 INFO
> > [ost_utils.deployment_utils.package_mgmt]
> > vdsm-4.40.70.2-1.git34cdc8884.el8.x86_64 (package_mgmt:135)
> > 2021-06-07 01:59:11+0000,950 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh
> > on lago-hc-basic-suite-master-host-1 (scripts:36)
> > 2021-06-07 01:59:11+0000,950 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh
> > on lago-hc-basic-suite-master-host-2 (scripts:36)
> > 2021-06-07 01:59:11+0000,952 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/common/deploy-scripts/setup_host.sh
> > on lago-hc-basic-suite-master-host-0 (scripts:36)
> > 2021-06-07 01:59:13+0000,260 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh
> > on lago-hc-basic-suite-master-host-1 (scripts:36)
> > 2021-06-07 01:59:13+0000,370 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh
> > on lago-hc-basic-suite-master-host-0 (scripts:36)
> > 2021-06-07 01:59:13+0000,526 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/hc_setup_host.sh
> > on lago-hc-basic-suite-master-host-2 (scripts:36)
> > 2021-06-07 01:59:15+0000,250 INFO
> > [ost_utils.deployment_utils.scripts] Running
> > /home/jenkins/workspace/ovirt-system-tests_hc-basic-suite-master/ovirt-system-tests/hc-basic-suite-master/setup_first_host.sh
> > on lago-hc-basic-suite-master-host-0 (scripts:36)
> >
> > So you see that hc_setup_host.sh was at least logged as being started
> > _after_ setup_host.sh, but very _close_ to it - I can't believe it
> > finished in 2 seconds. This part of the log is the same also for later
> > runs, although they fail earlier. You can compare this with the log of
> > the last successful run (using lago deploy), which also does not very
> > clearly show when each script finished, but at least logs their start
> > in the correct order.
> >
> > That said, I do not think the solution should be to now spend time on
> > investigating this, finding the root cause, and fixing - I think we
> > should instead stop keeping the list of deploy scripts in lagoinitfile
> > but move them simply to python code.
> [1] is unfortunately already gone - please ping me when you notice this
> kind of behavior again.

OK, it seems to be unrelated. Should be fixed with something like:

https://gerrit.ovirt.org/c/ovirt-system-tests/+/115318

hc is Not ready yet, though - no urgency in merging this.

>
> Regards, Marcin
>
> > Best regards,
> >
> >> Regards, Marcin
> >>
> >>>> Thanks and best regards,
> >>>>
> >>>> [1] 
> >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-master/2088/
> >>>>
> >>>> [2] https://github.com/oVirt/ovirt-ansible/issues/124
> >>>> --
> >>>> Didi
> >>>
> >>> --
> >>> Didi
> >>>
> >
>


-- 
Didi
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/OWODY6Q3QBZHPVBIUZTZ4DMNRGPMIEEA/

Reply via email to