Hi Dan,

In the last execution, the success rate was very low due to a large number
of failures on start VM caused, according to Michal, by the
vdsm-hook-allocate_net
that was installed on the host.

This is the latest status here, would you like me to re-execute? If so,
with or W/O vdsm-hook-allocate_net installed?

On Tue, May 29, 2018 at 1:14 PM, Dan Kenigsberg <[email protected]> wrote:

> On Mon, May 7, 2018 at 3:53 PM, Michal Skrivanek
> <[email protected]> wrote:
> > Hi Elad,
> > why did you install vdsm-hook-allocate_net?
> >
> > adding Dan as I think the hook is not supposed to fail this badly in any
> > case
>
> yep, this looks bad and deserves a little bug report. Installing this
> little hook should not block vm startup.
>
> But more importantly - what is the conclusion of this thread? Do we
> have a green light from QE to take this in?
>
>
> >
> > Thanks,
> > michal
> >
> > On 5 May 2018, at 19:22, Elad Ben Aharon <[email protected]> wrote:
> >
> > Start VM fails on:
> >
> > 2018-05-05 17:53:27,399+0300 INFO  (vm/e6ce66ce) [virt.vm]
> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') drive 'vda' path:
> > 'dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
> 9a78-bdd13a843c62/images/6cdabfe5-
> > d1ca-40af-ae63-9834f235d1c8/7ef97445-30e6-4435-8425-f35a01928211' ->
> > u'*dev=/rhev/data-center/mnt/blockSD/db5a6696-d907-4938-
> 9a78-bdd13a843c62/images/6cdabfe5-d1ca-40af-ae63-
> 9834f235d1c8/7ef97445-30e6-4435-8425-
> > f35a01928211' (storagexml:334)
> > 2018-05-05 17:53:27,888+0300 INFO  (jsonrpc/1) [vdsm.api] START
> > getSpmStatus(spUUID='940fe6f3-b0c6-4d0c-a921-198e7819c1cc',
> options=None)
> > from=::ffff:10.35.161.127,53512, task_id=c70ace39-dbfe-4f5c-
> ae49-a1e3a82c
> > 2758 (api:46)
> > 2018-05-05 17:53:27,909+0300 INFO  (vm/e6ce66ce) [root]
> > /usr/libexec/vdsm/hooks/before_device_create/10_allocate_net: rc=2
> err=vm
> > net allocation hook: [unexpected error]: Traceback (most recent call
> last):
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 105, in <module>
> >    main()
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 93, in main
> >    allocate_random_network(device_xml)
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 62, in allocate_random_network
> >    net = _get_random_network()
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 50, in _get_random_network
> >    available_nets = _parse_nets()
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 46, in _parse_nets
> >    return [net for net in os.environ[AVAIL_NETS_KEY].split()]
> >  File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
> >    raise KeyError(key)
> > KeyError: 'equivnets'
> >
> >
> > (hooks:110)
> > 2018-05-05 17:53:27,915+0300 ERROR (vm/e6ce66ce) [virt.vm]
> > (vmId='e6ce66ce-852f-48c5-9997-5d2959432a27') The vm start process
> failed
> > (vm:943)
> > Traceback (most recent call last):
> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
> > _startUnderlyingVm
> >    self._run()
> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2861, in
> _run
> >    domxml = hooks.before_vm_start(self._buildDomainXML(),
> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2254, in
> > _buildDomainXML
> >    dom, self.id, self._custom['custom'])
> >  File "/usr/lib/python2.7/site-packages/vdsm/virt/domxml_preprocess.py",
> > line 240, in replace_device_xml_with_hooks_xml
> >    dev_custom)
> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 134, in
> > before_device_create
> >    params=customProperties)
> >  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 120, in
> > _runHooksDir
> >    raise exception.HookError(err)
> > HookError: Hook Error: ('vm net allocation hook: [unexpected error]:
> > Traceback (most recent call last):\n  File
> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
> 105, in
> > <module>\n    main()\n
> >  File "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net",
> line
> > 93, in main\n    allocate_random_network(device_xml)\n  File
> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
> 62, i
> > n allocate_random_network\n    net = _get_random_network()\n  File
> > "/usr/libexec/vdsm/hooks/before_device_create/10_allocate_net", line
> 50, in
> > _get_random_network\n    available_nets = _parse_nets()\n  File "/us
> > r/libexec/vdsm/hooks/before_device_create/10_allocate_net", line 46, in
> > _parse_nets\n    return [net for net in
> > os.environ[AVAIL_NETS_KEY].split()]\n  File
> > "/usr/lib64/python2.7/UserDict.py", line 23, in __getit
> > em__\n    raise KeyError(key)\nKeyError: \'equivnets\'\n\n\n',)
> >
> >
> >
> > Hence, the success rate was 28% against 100% running with d/s (d/s). If
> > needed, I'll compare against the latest master, but I think you get the
> > picture with d/s.
> >
> > vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
> > libvirt-3.9.0-14.el7_5.3.x86_64
> > qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64
> > kernel 3.10.0-862.el7.x86_64
> > rhel7.5
> >
> >
> > Logs attached
> >
> > On Sat, May 5, 2018 at 1:26 PM, Elad Ben Aharon <[email protected]>
> wrote:
> >>
> >> nvm, found gluster 3.12 repo, managed to install vdsm
> >>
> >> On Sat, May 5, 2018 at 1:12 PM, Elad Ben Aharon <[email protected]>
> >> wrote:
> >>>
> >>> No, vdsm requires it:
> >>>
> >>> Error: Package: vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64
> >>> (/vdsm-4.20.27-3.gitfee7810.el7.centos.x86_64)
> >>>           Requires: glusterfs-fuse >= 3.12
> >>>           Installed: glusterfs-fuse-3.8.4-54.8.el7.x86_64 (@rhv-4.2.3)
> >>>
> >>> Therefore, vdsm package installation is skipped upon force install.
> >>>
> >>> On Sat, May 5, 2018 at 11:42 AM, Michal Skrivanek
> >>> <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 5 May 2018, at 00:38, Elad Ben Aharon <[email protected]> wrote:
> >>>>
> >>>> Hi guys,
> >>>>
> >>>> The vdsm build from the patch requires glusterfs-fuse > 3.12. This is
> >>>> while the latest 4.2.3-5 d/s build requires 3.8.4 (3.4.0.59rhs-1.el7)
> >>>>
> >>>>
> >>>> because it is still oVirt, not a downstream build. We can’t really do
> >>>> downstream builds with unmerged changes:/
> >>>>
> >>>> Trying to get this gluster-fuse build, so far no luck.
> >>>> Is this requirement intentional?
> >>>>
> >>>>
> >>>> it should work regardless, I guess you can force install it without
> the
> >>>> dependency
> >>>>
> >>>>
> >>>> On Fri, May 4, 2018 at 2:38 PM, Michal Skrivanek
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Hi Elad,
> >>>>> to make it easier to compare, Martin backported the change to 4.2 so
> it
> >>>>> is actually comparable with a run without that patch. Would you
> please try
> >>>>> that out?
> >>>>> It would be best to have 4.2 upstream and this[1] run to really
> >>>>> minimize the noise.
> >>>>>
> >>>>> Thanks,
> >>>>> michal
> >>>>>
> >>>>> [1]
> >>>>> http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-
> demand-el7-x86_64/28/
> >>>>>
> >>>>> On 27 Apr 2018, at 09:23, Martin Polednik <[email protected]>
> wrote:
> >>>>>
> >>>>> On 24/04/18 00:37 +0300, Elad Ben Aharon wrote:
> >>>>>
> >>>>> I will update with the results of the next tier1 execution on latest
> >>>>> 4.2.3
> >>>>>
> >>>>>
> >>>>> That isn't master but old branch though. Could you run it against
> >>>>> *current* VDSM master?
> >>>>>
> >>>>> On Mon, Apr 23, 2018 at 3:56 PM, Martin Polednik <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>> On 23/04/18 01:23 +0300, Elad Ben Aharon wrote:
> >>>>>
> >>>>> Hi, I've triggered another execution [1] due to some issues I saw in
> >>>>> the
> >>>>> first which are not related to the patch.
> >>>>>
> >>>>> The success rate is 78% which is low comparing to tier1 executions
> with
> >>>>> code from downstream builds (95-100% success rates) [2].
> >>>>>
> >>>>>
> >>>>> Could you run the current master (without the dynamic_ownership
> patch)
> >>>>> so that we have viable comparision?
> >>>>>
> >>>>> From what I could see so far, there is an issue with move and copy
> >>>>>
> >>>>> operations to and from Gluster domains. For example [3].
> >>>>>
> >>>>> The logs are attached.
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>> *https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
> >>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/
> >>>>> <https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv
> >>>>> -4.2-ge-runner-tier1-after-upgrade/7/testReport/>*
> >>>>>
> >>>>>
> >>>>>
> >>>>> [2]
> >>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/
> >>>>>
> >>>>> rhv-4.2-ge-runner-tier1-after-upgrade/7/
> >>>>>
> >>>>>
> >>>>>
> >>>>> [3]
> >>>>> 2018-04-22 13:06:28,316+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH
> >>>>> deleteImage error=Image does not exist in domain:
> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
> >>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
> >>>>> from=:
> >>>>> :ffff:10.35.161.182,40936, flow_id=disks_syncAction_
> ba6b2630-5976-4935,
> >>>>> task_id=3d5f2a8a-881c-409e-93e9-aaa643c10e42 (api:51)
> >>>>> 2018-04-22 13:06:28,317+0300 ERROR (jsonrpc/7)
> >>>>> [storage.TaskManager.Task]
> >>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') Unexpected error
> >>>>> (task:875)
> >>>>> Traceback (most recent call last):
> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line
> 882,
> >>>>> in
> >>>>> _run
> >>>>>  return fn(*args, **kargs)
> >>>>> File "<string>", line 2, in deleteImage
> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
> 49, in
> >>>>> method
> >>>>>  ret = func(*args, **kwargs)
> >>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
> 1503,
> >>>>> in
> >>>>> deleteImage
> >>>>>  raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
> >>>>> ImageDoesNotExistInSD: Image does not exist in domain:
> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
> >>>>> domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'
> >>>>>
> >>>>> 2018-04-22 13:06:28,317+0300 INFO  (jsonrpc/7)
> >>>>> [storage.TaskManager.Task]
> >>>>> (Task='3d5f2a8a-881c-409e-93e9-aaa643c10e42') aborting: Task is
> >>>>> aborted:
> >>>>> "Image does not exist in domain: 'image=cabb8846-7a4b-4244-9835-
> >>>>> 5f603e682f33, domain=e5fd29c8-52ba-467e-be09-ca40ff054dd4'" - code
> 268
> >>>>> (task:1181)
> >>>>> 2018-04-22 13:06:28,318+0300 ERROR (jsonrpc/7) [storage.Dispatcher]
> >>>>> FINISH
> >>>>> deleteImage error=Image does not exist in domain:
> >>>>> 'image=cabb8846-7a4b-4244-9835-5f603e682f33,
> >>>>> domain=e5fd29c8-52ba-467e-be09
> >>>>> -ca40ff054d
> >>>>> d4' (dispatcher:82)
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 19, 2018 at 5:34 PM, Elad Ben Aharon <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>> Triggered a sanity tier1 execution [1] using [2], which covers all
> the
> >>>>>
> >>>>> requested areas, on iSCSI, NFS and Gluster.
> >>>>> I'll update with the results.
> >>>>>
> >>>>> [1]
> >>>>> https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2
> >>>>> _dev/job/rhv-4.2-ge-flow-storage/1161/
> >>>>>
> >>>>> [2]
> >>>>> https://gerrit.ovirt.org/#/c/89830/
> >>>>> vdsm-4.30.0-291.git77aef9a.el7.x86_64
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 19, 2018 at 3:07 PM, Martin Polednik <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>> On 19/04/18 14:54 +0300, Elad Ben Aharon wrote:
> >>>>>
> >>>>>
> >>>>> Hi Martin,
> >>>>>
> >>>>>
> >>>>> I see [1] requires a rebase, can you please take care?
> >>>>>
> >>>>>
> >>>>> Should be rebased.
> >>>>>
> >>>>> At the moment, our automation is stable only on iSCSI, NFS, Gluster
> and
> >>>>>
> >>>>> FC.
> >>>>> Ceph is not supported and Cinder will be stabilized soon, AFAIR, it's
> >>>>> not
> >>>>> stable enough at the moment.
> >>>>>
> >>>>>
> >>>>> That is still pretty good.
> >>>>>
> >>>>>
> >>>>> [1] https://gerrit.ovirt.org/#/c/89830/
> >>>>>
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> On Wed, Apr 18, 2018 at 2:17 PM, Martin Polednik <
> [email protected]
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> On 18/04/18 11:37 +0300, Elad Ben Aharon wrote:
> >>>>>
> >>>>>
> >>>>> Hi, sorry if I misunderstood, I waited for more input regarding what
> >>>>>
> >>>>> areas
> >>>>> have to be tested here.
> >>>>>
> >>>>>
> >>>>> I'd say that you have quite a bit of freedom in this regard.
> >>>>>
> >>>>> GlusterFS
> >>>>> should be covered by Dennis, so iSCSI/NFS/ceph/cinder with some suite
> >>>>> that covers basic operations (start & stop VM, migrate it), snapshots
> >>>>> and merging them, and whatever else would be important for storage
> >>>>> sanity.
> >>>>>
> >>>>> mpolednik
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 18, 2018 at 11:16 AM, Martin Polednik <
> >>>>> [email protected]
> >>>>> >
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>> On 11/04/18 16:52 +0300, Elad Ben Aharon wrote:
> >>>>>
> >>>>>
> >>>>> We can test this on iSCSI, NFS and GlusterFS. As for ceph and
> >>>>> cinder,
> >>>>>
> >>>>> will
> >>>>>
> >>>>> have to check, since usually, we don't execute our automation on
> >>>>> them.
> >>>>>
> >>>>>
> >>>>> Any update on this? I believe the gluster tests were successful,
> >>>>> OST
> >>>>>
> >>>>> passes fine and unit tests pass fine, that makes the storage
> >>>>> backends
> >>>>> test the last required piece.
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 11, 2018 at 4:38 PM, Raz Tamir <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>> +Elad
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 11, 2018 at 4:28 PM, Dan Kenigsberg <[email protected]
> >>>>>
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> On Wed, Apr 11, 2018 at 12:34 PM, Nir Soffer <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 11, 2018 at 12:31 PM Eyal Edri <[email protected]>
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>> Please make sure to run as much OST suites on this patch as
> >>>>>
> >>>>> possible
> >>>>>
> >>>>> before merging ( using 'ci please build' )
> >>>>>
> >>>>>
> >>>>>
> >>>>> But note that OST is not a way to verify the patch.
> >>>>>
> >>>>>
> >>>>> Such changes require testing with all storage types we support.
> >>>>>
> >>>>> Nir
> >>>>>
> >>>>> On Tue, Apr 10, 2018 at 4:09 PM, Martin Polednik <
> >>>>> [email protected]
> >>>>> >
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>
> >>>>> Hey,
> >>>>>
> >>>>>
> >>>>> I've created a patch[0] that is finally able to activate
> >>>>>
> >>>>> libvirt's
> >>>>> dynamic_ownership for VDSM while not negatively affecting
> >>>>> functionality of our storage code.
> >>>>>
> >>>>> That of course comes with quite a bit of code removal, mostly
> >>>>> in
> >>>>> the
> >>>>> area of host devices, hwrng and anything that touches devices;
> >>>>> bunch
> >>>>> of test changes and one XML generation caveat (storage is
> >>>>> handled
> >>>>> by
> >>>>> VDSM, therefore disk relabelling needs to be disabled on the
> >>>>> VDSM
> >>>>> level).
> >>>>>
> >>>>> Because of the scope of the patch, I welcome
> >>>>> storage/virt/network
> >>>>> people to review the code and consider the implication this
> >>>>> change
> >>>>> has
> >>>>> on current/future features.
> >>>>>
> >>>>> [0] https://gerrit.ovirt.org/#/c/89830/
> >>>>>
> >>>>>
> >>>>> In particular:  dynamic_ownership was set to 0 prehistorically
> >>>>> (as
> >>>>>
> >>>>>
> >>>>> part
> >>>>>
> >>>>>
> >>>>> of https://bugzilla.redhat.com/show_bug.cgi?id=554961 ) because
> >>>>>
> >>>>> libvirt,
> >>>>> running as root, was not able to play properly with root-squash
> >>>>> nfs
> >>>>> mounts.
> >>>>>
> >>>>> Have you attempted this use case?
> >>>>>
> >>>>> I join to Nir's request to run this with storage QE.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>>
> >>>>>
> >>>>> Raz Tamir
> >>>>> Manager, RHV QE
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Devel mailing list
> >>>>> [email protected]
> >>>>> http://lists.ovirt.org/mailman/listinfo/devel
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
> > <logs.tar.gz>
> >
> >
>
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/MK64KRYXK572LH5CJ7OV6DSXHGHKUSMV/

Reply via email to