Yaniv Kaul <[email protected]> writes: > On Wed, Jan 11, 2017 at 12:49 PM, Milan Zamazal <[email protected]> wrote: > >> I just ran ovirt-system-tests on two very different machines. It passed >> on one of them, while it failed on the other one, at a different place: >> >> @ Run test: 005_network_by_label.py: >> nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] >> # assign_hosts_network_label: >> Error while running thread >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 55, in >> _ret_via_queue >> queue.put({'return': func()}) >> File "/var/local/lago/ovirt-system-tests/basic-suite-master/test- >> scenarios/005_network_by_label.py", line 56, in _assign_host_network_label >> host_nic=nic >> File >> "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", >> line 16231, in add >> headers={"Correlation-Id":correlation_id, "Expect":expect} >> File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", >> line 79, in add >> return self.request('POST', url, body, headers, cls=cls) >> File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", >> line 122, in request >> persistent_auth=self.__persistent_auth >> File >> "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", >> line 79, in do_request >> persistent_auth) >> File >> "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", >> line 162, in __do_request >> raise errors.RequestError(response_code, response_reason, >> response_body) >> RequestError: >> status: 409 >> reason: Conflict >> detail: Cannot add Label. Operation can be performed only when Host >> status is Maintenance, Up, NonOperational. >> > > This is an issue we've seen from time to time and have not figured it out > yet. Do you have engine logs for it?
Yes, I still have the given tests run instance available. Here's an excerpt, I'll send you the complete logs off-list (they are large):
engine.log-excerpt.xz
Description: application/xz
>> I can also see occasional errors like the following in vdsm.log:
>>
>> ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from
>> <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.201.3',
>> 47434, 0, 0) at 0x271fd88>: (104, 'Connection reset by peer')
>> (betterAsyncore:119)
>>
>
> This is the core issue of today's - but probably unrelated to the issue
> you've just described, that we have seen happening from time to time in the
> past (I'd say that I've seen it happening last time ~2 weeks ago or so, but
> it's not reproducible easily to me).
> Y.
>
>
>>
>> So we are probably dealing with an error that occurs "randomly" and is
>> not related to a particular test.
>>
>> Daniel Belenky <[email protected]> writes:
>>
>> > Link to Jenkins
>> > <http://jenkins.ovirt.org/view/experimental%20jobs/job/
>> test-repo_ovirt_experimental_master/4648/artifact/exported-
>> artifacts/basic_suite_master.sh-el7/exported-artifacts/>
>> >
>> > On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <[email protected]>
>> > wrote:
>> >
>> >> Hi all
>> >>
>> >> On 01/11/2017 08:52 AM, Eyal Edri wrote:
>> >>
>> >> Adding Tomas from Virt.
>> >>
>> >> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski <
>> >> [email protected]> wrote:
>> >>
>> >>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky <[email protected]>
>> >>> wrote:
>> >>> > Hi all,
>> >>> >
>> >>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed on
>> >>> > basic_sanity scenario.
>> >>> > The job was triggered by https://gerrit.ovirt.org/#/c/69845/
>> >>> >
>> >>> > From looking at the logs, it seems that the reason is VDSM.
>> >>> >
>> >>> > In the VDSM log, i see the following error:
>> >>> >
>> >>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor))
>> [vds.dispatcher]
>> >>> SSL
>> >>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>> >>> ('::1',
>> >>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
>> >>>
>> >>
>> >> Daniel, could you please remind me the jenkins link? I see something
>> >> suspicious on the Vdsm log.
>> >> Most notably, Vdsm received SIGTERM. Is this expected and part of the
>> test?
>> >>
>> >> >
>> >>>
>> >>> This issue means that the client closed connection while vdsm was
>> >>> replying. It can happen at any time
>> >>> when the client is not nice with the connection. As you can see the
>> >>> client connected locally '::1'.
>> >>>
>> >>> >
>> >>> > Also, when looking at the MOM logs, I see the the following:
>> >>> >
>> >>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot connect
>> to
>> >>> > VDSM! [Errno 111] Connection refused
>> >>> >
>> >>>
>> >>> Looking at the log at this time vdsm had no open socket.
>> >>
>> >>
>> >>
>> >> Correct, but IIRC we have a race on startup - that's the reason why MOM
>> >> retries to connect. After the new try, MOM seems to behave
>> >> correctly:
>> >>
>> >> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
>> >> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
>> >>
>> >> --
>> >> Francesco Romani
>> >> Red Hat Engineering Virtualization R & D
>> >> IRC: fromani
>> >>
>> >>
>> _______________________________________________
>> Devel mailing list
>> [email protected]
>> http://lists.ovirt.org/mailman/listinfo/devel
>>
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
