[ovirt-devel] Re: [VDSM] Travis builds still fail on .coverage rename

Edward Haas Sun, 08 Jul 2018 06:42:55 -0700

On Sun, Jul 8, 2018 at 1:42 PM, Nir Soffer <nsof...@redhat.com> wrote:


> On Sat, Jul 7, 2018 at 9:11 AM Edward Haas <eh...@redhat.com> wrote:
>
>> On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <eh...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsof...@redhat.com> wrote:
>>>
>>>> On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <eh...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 6 Jul 2018, at 18:41, Nir Soffer <nsof...@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 6 Jul 2018, 18:25 Edward Haas, <eh...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 6 Jul 2018, at 14:35, Nir Soffer <nsof...@redhat.com> wrote:
>>>>>>
>>>>>> On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <eh...@redhat.com> wrote:
>>>>>>
>>>>>>> I do not know if it is relevant or not, but the tests that travis
>>>>>>> runs for master are taken from the 4.2 branch.
>>>>>>> OVS tests are now running using pytest.
>>>>>>>
>>>>>>
>>>>>> What do you mean by "taken from 4.2 branch"?
>>>>>>
>>>>>>
>>>>>> I mean that the branch checked out is 4.2 and not master. It even
>>>>>> says so on the console output.
>>>>>>
>>>>>
>>>>> Can you share the url of that build?
>>>>>
>>>>>
>>>>> I just clicked the icon on the vdsm repo: https://travis-ci.org/
>>>>> oVirt/vdsm
>>>>>
>>>>
>>>> This is indeed 4.2 build. Any commit in github is tested in travis.
>>>> We would like to fix also the 4.2 builds, but first we need to fix
>>>> master builds.
>>>>
>>>> You can see here that master build fail:
>>>> https://travis-ci.org/oVirt/vdsm/builds
>>>>
>>>> Since we added gbd and python-debuginfo:
>>>> https://travis-ci.org/oVirt/vdsm/builds/400644077
>>>>
>>>> - centos build fail (network-py27)
>>>>   https://travis-ci.org/oVirt/vdsm/jobs/400644079
>>>>
>>>> - fedora 28 build pass
>>>>   https://travis-ci.org/oVirt/vdsm/jobs/400644081
>>>>
>>>> - fedora rawhide fail because we cannot rebuild the image,
>>>>   python-libblokdev is missing in rawhide.
>>>>   https://travis-ci.org/oVirt/vdsm/jobs/400644083
>>>>   See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/
>>>> CDNETITY5RYOCQBIQQF2NUF6RAHGJRPW/
>>>>
>>>>
>>>>  I don't know anything about these tests, but this failure looks like:
>>>>
>>>> 1. first test has a timeout
>>>> 2. first test cleanup did not run because the cleanup code is not
>>>> correct
>>>> 3. second test fail because the first test did not clean up
>>>>
>>>> This looks like real issue in the code.
>>>>
>>>
>>> This is the same problem we had on oVirt CI, there are linux bridges on
>>> the node.
>>> I have posted a patch to fail earlier and how the real problem:
>>> https://gerrit.ovirt.org/#/c/92867/
>>> The travis-ci run for it is here: https://travis-ci.org/EdDev/
>>> vdsm/jobs/401143906
>>> This is the problem:
>>> cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl
>>> --system-id=random start (cwd None)
>>> cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in
>>> use by: br_netfilter\n'; <rc> = 1
>>>
>>
> Who is using rmmod?
>

The ovs service is trying to load the ovs kmod, for doing so it needs to
take down the bridge one and reload it after the ovs one.


>
>> Any idea who is creating the "br_netfilter" bridge? I guess this is
>>> travis-ci related.
>>>
>>
> Why do we care about br_netfilter? do we require a system without any
> bridge?
>

Yes, in case ovs kmod has not been loaded in advance.


>
>
>> Actually, this may be Docker or some other package that is
>> installed/setup on it.
>> How can I run the docker with the tests locally to debug this?
>>
>
> Run this in vdsm root directory (copied from .travis.yml):
>
> export DOCKER_IMAGE=ovirtorg/vdsm-test-centos
>
> docker pull $DOCKER_IMAGE
>
> docker run \
>     --env TRAVIS_CI=1 \
>     --privileged \
>     --rm \
>     -it \
>     -v `pwd`:/vdsm:Z \
>     $DOCKER_IMAGE \
>     bash -c "cd /vdsm && ./autogen.sh --system && make && make --jobs=2
> check"
>
> Since this is privileged container, you probably want to run this inside a
> vm.
>

OK, will try. But I think the kmod is up to the machine the docker runs in,
so in this case it is the travis slave.


>
>> We run "make check" both in travis (.travis.yml) and ovirt ci
>>>>>> (automation/check-patch.sh)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsof...@redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsof...@redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsof...@redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <dan...@redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsof...@redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <
>>>>>>>>>>> dan...@redhat.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <
>>>>>>>>>>> nsof...@redhat.com> wrote:
>>>>>>>>>>> >> > Dan, travis build still fail when renaming coverage file
>>>>>>>>>>> even after
>>>>>>>>>>> >> > your last patch.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > ...........................SS.
>>>>>>>>>>> SS..........................................................
>>>>>>>>>>> ............................................................
>>>>>>>>>>> ...........................................SS...............
>>>>>>>>>>> ...................................S.S......................
>>>>>>>>>>> ..........S................................SS.....SS........
>>>>>>>>>>> ....................................S...............SSS...S.
>>>>>>>>>>> ....S.............................................S.........
>>>>>>>>>>> .......................................................SSS..
>>>>>>>>>>> ..........SSSS..SSSSSSSSS.SS................................
>>>>>>>>>>> ............................................................
>>>>>>>>>>> ............................................................
>>>>>>>>>>> ..........
>>>>>>>>>>> >> > ------------------------------
>>>>>>>>>>> ----------------------------------------
>>>>>>>>>>> >> > Ran 1267 tests in 99.239s
>>>>>>>>>>> >> > OK (SKIP=63)
>>>>>>>>>>> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage
>>>>>>>>>>> .coverage-nose-py2
>>>>>>>>>>> >> > make[1]: *** [check] Error 1
>>>>>>>>>>> >> > make[1]: Leaving directory `/vdsm/tests'
>>>>>>>>>>> >> > ERROR: InvocationError: '/usr/bin/make -C tests check'
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Do you have any idea what is wrong there?
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Why we don't have any error message from the failed command?
>>>>>>>>>>> >>
>>>>>>>>>>> >> No idea, nothing pops to mind.
>>>>>>>>>>> >> We can revert to the sillier [ -f .coverage ] condition
>>>>>>>>>>> instead of
>>>>>>>>>>> >> understanding (yeah, this feels dirty)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/)
>>>>>>>>>>> fixed this
>>>>>>>>>>> > failure.
>>>>>>>>>>> >
>>>>>>>>>>> > Now we have failures for the pywatch_test, and some network
>>>>>>>>>>> > tests. Can someone from network look at this?
>>>>>>>>>>> > https://travis-ci.org/nirs/vdsm/builds/400204807
>>>>>>>>>>>
>>>>>>>>>>> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows
>>>>>>>>>>>
>>>>>>>>>>>               ConfigNetworkError: (21, 'Executing commands
>>>>>>>>>>> failed:
>>>>>>>>>>> ovs-vsctl: cannot create a bridge named vdsmbr_test because a
>>>>>>>>>>> bridge
>>>>>>>>>>> named vdsmbr_test already exists')
>>>>>>>>>>>
>>>>>>>>>>> which I thought was limited to dirty ovirt-ci jenkins slaves.
>>>>>>>>>>> Any idea
>>>>>>>>>>> why it shows here?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Maybe one failed test leave dirty host to the next test?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> network tests fail now only on CentOS now.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> py-watch seems to be failing due to missing gdb on the travis
>>>>>>>>>>> image
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> cmdutils.py                151 DEBUG    ./py-watch 0.1 sleep 10
>>>>>>>>>>> (cwd None)
>>>>>>>>>>> cmdutils.py                159 DEBUG    FAILED: <err> =
>>>>>>>>>>> 'Traceback
>>>>>>>>>>> (most recent call last):\n  File "./py-watch", line 60, in
>>>>>>>>>>> <module>\n
>>>>>>>>>>>   dump_trace(watched_proc)\n  File "./py-watch", line 32, in
>>>>>>>>>>> dump_trace\n    \'thread apply all py-bt\'])\n  File
>>>>>>>>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575,
>>>>>>>>>>> in
>>>>>>>>>>> call\n    p = Popen(*popenargs, **kwargs)\n  File
>>>>>>>>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822,
>>>>>>>>>>> in
>>>>>>>>>>> __init__\n    restore_signals, start_new_session)\n  File
>>>>>>>>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line
>>>>>>>>>>> 1567, in
>>>>>>>>>>> _execute_child\n    raise child_exception_type(errno_num,
>>>>>>>>>>> err_msg)\nOSError: [Errno 2] No such file or directory:
>>>>>>>>>>> \'gdb\'\n';
>>>>>>>>>>> <rc> = 1
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cool, easy fix.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Fixed by https://gerrit.ovirt.org/#/c/92846/
>>>>>>>>>
>>>>>>>>
>>>>>>>> Fedora 28 build is green with this change:
>>>>>>>> https://travis-ci.org/nirs/vdsm/jobs/400549561
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ___________________________________ summary 
>>>>>>>> ____________________________________
>>>>>>>>   tests: commands succeeded
>>>>>>>>   storage-py27: commands succeeded
>>>>>>>>   storage-py36: commands succeeded
>>>>>>>>   lib-py27: commands succeeded
>>>>>>>>   lib-py36: commands succeeded
>>>>>>>>   network-py27: commands succeeded
>>>>>>>>   network-py36: commands succeeded
>>>>>>>>   virt-py27: commands succeeded
>>>>>>>>   virt-py36: commands succeeded
>>>>>>>>   congratulations :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Nir, could you remind me what is "ERROR: InterpreterNotFound:
>>>>>>>>>>> python3.6" and how can we avoid it? it keeps distracting during
>>>>>>>>>>> debugging test failures.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We can avoid it in travis using env matrix.
>>>>>>>>>>
>>>>>>>>>> Currently we run "make check" which run all the the tox envs
>>>>>>>>>> (e.g. storage-py27,storage-py36) regardless of the build type.
>>>>>>>>>> This is good
>>>>>>>>>> for manual usage when you don't know which python version is
>>>>>>>>>> available
>>>>>>>>>> on a developer machine. For example if I have python 3.7
>>>>>>>>>> installed, maybe
>>>>>>>>>> I like to test.
>>>>>>>>>>
>>>>>>>>>> We can change this so we will test only the *-py27 on centos, and
>>>>>>>>>> both
>>>>>>>>>> *-py27 and *-py36 on Fedora.
>>>>>>>>>>
>>>>>>>>>> We can do the same in ovirt CI but it will be harder, we don't
>>>>>>>>>> have a declerative
>>>>>>>>>> way to configure this.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Fixed all builds using --enable-python3:
>>>>>>>>> https://gerrit.ovirt.org/#/c/92847/
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here is an example from CentOS build - no false errors.
>>>>>>>>
>>>>>>>> ___________________________________ summary 
>>>>>>>> ____________________________________
>>>>>>>>   tests: commands succeeded
>>>>>>>>   storage-py27: commands succeeded
>>>>>>>>   lib-py27: commands succeeded
>>>>>>>> ERROR:   network-py27: commands failed
>>>>>>>>   virt-py27: commands succeeded
>>>>>>>> make: *** [tests] Error 1
>>>>>>>> make: *** Waiting for unfinished jobs....
>>>>>>>> ___________________________________ summary 
>>>>>>>> ____________________________________
>>>>>>>>   pylint: commands succeeded
>>>>>>>>   congratulations :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nir
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>

_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4H6I3CQKOAT5PGT22CRO54HWJFOWD4AC/

[ovirt-devel] Re: [VDSM] Travis builds still fail on .coverage rename

Reply via email to