On Thu, Apr 20, 2017 at 11:55 AM, Piotr Kliczewski <
piotr.kliczew...@gmail.com> wrote:

> On Thu, Apr 20, 2017 at 10:32 AM, Yaniv Kaul <yk...@redhat.com> wrote:
> > No, that's not the issue.
> > I've seen it happening few times.
> >
> > 1. It always with the ISO domain (which we don't use anyway in o-s-t)
> > 2. Apparently, only one host is asking for a mount:
> >  authenticated mount request from 192.168.201.4:713 for /exports/nfs/iso
> > (/exports/nfs/iso)
> >
> > (/var/log/messages of the NFS server)
> >
> > And indeed, you can see in[1] that host1 made the request and all is
> well on
> > it.
> >
> > However, there are connection issues with host0 which cause a timeout to
> > connectStorageServer():
> > From[2]:
> >
> > 2017-04-19 18:58:58,465-04 DEBUG
> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
> (ResponseWorker) []
> > Message received:
> > {"jsonrpc":"2.0","error":{"code":"lago-basic-suite-
> master-host0:192912448","message":"Vds
> > timeout occured"},"id":null}
> >
> > 2017-04-19 18:58:58,475-04 ERROR
> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (org.ovirt.thread.pool-7-thread-37) [755b908a] EVENT_ID:
> > VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
> null,
> > Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host0 command
> > ConnectStorageServerVDS failed: Message timeout which can be caused by
> > communication issues
> > 2017-04-19 18:58:58,475-04 INFO
> > [org.ovirt.engine.core.vdsbroker.vdsbroker.
> ConnectStorageServerVDSCommand]
> > (org.ovirt.thread.pool-7-thread-37) [755b908a] Command
> > 'org.ovirt.engine.core.vdsbroker.vdsbroker.
> ConnectStorageServerVDSCommand'
> > return value '
> > ServerConnectionStatusReturn:{status='Status [code=5022, message=Message
> > timeout which can be caused by communication issues]'}
> >
> >
> > I wonder why, but on /var/log/messages[3], I'm seeing:
> > Apr 19 18:56:58 lago-basic-suite-master-host0 journal: vdsm Executor WARN
> > Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask
> {'params':
> > {u'connectionParams': [{u'id': u'4ca8fc84-d872-4a7f-907f-9445bda7b6d1',
> > u'connection': u'192.168.201.3:/exports/nfs/share1', u'iqn': u'',
> u'user':
> > u'', u'tpgt': u'1', u'protocol_version': u'4.2', u'password': '********',
> > u'port': u''}], u'storagepoolID': u'00000000-0000-0000-0000-
> 000000000000',
> > u'domainType': 1}, 'jsonrpc': '2.0', 'method':
> > u'StoragePool.connectStorageServer', 'id':
> > u'057da9c2-1e67-4c2f-9511-7d9de250386b'} at 0x2f44110> timeout=60,
> > duration=60 at 0x2f44310> task#=9 at 0x2ac11d0>
> > ...
> >
>
> I see following sequence:
>
> The message is sent:
>
> 2017-04-19 18:55:58,020-04 DEBUG
> [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
> (org.ovirt.thread.pool-7-thread-37) [755b908a] Message sent: SEND
> destination:jms.topic.vdsm_requests
> content-length:381
> ovirtCorrelationId:755b908a
> reply-to:jms.topic.vdsm_responses
>
> <JsonRpcRequest id: "057da9c2-1e67-4c2f-9511-7d9de250386b", method:
> StoragePool.connectStorageServer, params:
> {storagepoolID=00000000-0000-0000-0000-000000000000, domainType=1,
> connectionParams=[{password=, protocol_version=4.2, port=, iqn=,
> connection=192.168.201.3:/exports/nfs/share1,
> id=4ca8fc84-d872-4a7f-907f-9445bda7b6d1, user=, tpgt=1}]}>
>
> There is no response for specified amount of time and we timeout:
>
> 2017-04-19 18:58:58,465-04 DEBUG
> [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
> (ResponseWorker) [] Message received:
> {"jsonrpc":"2.0","error":{"code":"lago-basic-suite-
> master-host0:192912448","message":"Vds
> timeout occured"},"id":null}
>
> As Yaniv pointed here is why we never get the response:
>
> Apr 19 18:58:58 lago-basic-suite-master-host0 journal: vdsm Executor
> WARN Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask
> {'params': {u'connectionParams': [{u'id':
> u'4ca8fc84-d872-4a7f-907f-9445bda7b6d1', u'connection':
> u'192.168.201.3:/exports/nfs/share1', u'iqn': u'', u'user': u'',
> u'tpgt': u'1', u'protocol_version': u'4.2', u'password': '********',
> u'port': u''}], u'storagepoolID':
> u'00000000-0000-0000-0000-000000000000', u'domainType': 1}, 'jsonrpc':
> '2.0', 'method': u'StoragePool.connectStorageServer', 'id':
> u'057da9c2-1e67-4c2f-9511-7d9de250386b'} at 0x2f44110> timeout=60,
> duration=180 at 0x2f44310> task#=9 at 0x2ac11d0>
>
> >
> > 3. Also, there is still the infamous unable to update response issues.
> >
>
> When we see timeout on a call our default behavior is to reconnect
> when we clean pending messages.
> As a result when we reconnect and receive a response from the message
> sent before disconnect
> we say it is unknown to us.
>

But the example I've given was earlier than the storage issue?
Y.


>
> > {"jsonrpc":"2.0","method":"Host.ping","params":{},"id":"
> 7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"}�
> > 2017-04-19 18:54:27,843-04 DEBUG
> > [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
> > (org.ovirt.thread.pool-7-thread-15) [62d198cc] Message sent: SEND
> > destination:jms.topic.vdsm_requests
> > content-length:94
> > ovirtCorrelationId:62d198cc
> > reply-to:jms.topic.vdsm_responses
> >
> > <JsonRpcRequest id: "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0", method:
> > Host.ping, params: {}>
> > 2017-04-19 18:54:27,885-04 DEBUG
> > [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message]
> > (org.ovirt.thread.pool-7-thread-16) [1f9aac13] SEND
> > ovirtCorrelationId:1f9aac13
> > destination:jms.topic.vdsm_requests
> > reply-to:jms.topic.vdsm_responses
> > content-length:94
> >
> > ...
> >
> > {"jsonrpc": "2.0", "id": "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0",
> "result":
> > true}�
> > 2017-04-19 18:54:32,132-04 DEBUG
> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
> (ResponseWorker) []
> > Message received: {"jsonrpc": "2.0", "id":
> > "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0", "result": true}
> > 2017-04-19 18:54:32,133-04 ERROR
> > [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not
> able
> > to update response for "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"
> >
> >
> > Would be nice to understand why.
> >
> >
> >
> > 4. Lastly, MOM is not running. Why?
> >
> >
> > Please open a bug with the details from item #2 above.
> > Y.
> >
> >
> > [1]
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/6403/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-002_bootstrap.py/
> lago-basic-suite-master-host1/_var_log/vdsm/supervdsm.log
> > [2]
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/6403/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-002_bootstrap.py/
> lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
> > [3]
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/6403/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-002_bootstrap.py/
> lago-basic-suite-master-host0/_var_log/messages
> >
> >
> >
> >
> >
> > On Thu, Apr 20, 2017 at 9:27 AM, Gil Shinar <gshi...@redhat.com> wrote:
> >>
> >> Test failed: add_secondary_storage_domains
> >> Link to suspected patches:
> >> Link to Job:
> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403
> >> Link to all logs:
> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/6403/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-002_bootstrap.py
> >>
> >>
> >> Error seems to be:
> >> 2017-04-19 18:58:58,774-0400 ERROR (jsonrpc/2)
> [storage.TaskManager.Task]
> >> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a') Unexpected error
> (task:871)
> >> Traceback (most recent call last):
> >>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line
> 878,
> >> in _run
> >>     return fn(*args, **kargs)
> >>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in
> >> wrapper
> >>     res = f(*args, **kwargs)
> >>   File "/usr/share/vdsm/storage/hsm.py", line 2709, in
> >> getStorageDomainInfo
> >>     dom = self.validateSdUUID(sdUUID)
> >>   File "/usr/share/vdsm/storage/hsm.py", line 298, in validateSdUUID
> >>     sdDom = sdCache.produce(sdUUID=sdUUID)
> >>   File "/usr/share/vdsm/storage/sdc.py", line 112, in produce
> >>     domain.getRealDomain()
> >>   File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
> >>     return self._cache._realProduce(self._sdUUID)
> >>   File "/usr/share/vdsm/storage/sdc.py", line 136, in _realProduce
> >>     domain = self._findDomain(sdUUID)
> >>   File "/usr/share/vdsm/storage/sdc.py", line 153, in _findDomain
> >>     return findMethod(sdUUID)
> >>   File "/usr/share/vdsm/storage/sdc.py", line 178, in
> _findUnfetchedDomain
> >>     raise se.StorageDomainDoesNotExist(sdUUID)
> >> StorageDomainDoesNotExist: Storage domain does not exist:
> >> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)
> >> 2017-04-19 18:58:58,777-0400 INFO  (jsonrpc/2)
> [storage.TaskManager.Task]
> >> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a') aborting: Task is
> aborted:
> >> 'Storage domain does not exist' - code 358 (task:1176)
> >> 2017-04-19 18:58:58,777-0400 ERROR (jsonrpc/2) [storage.Dispatcher]
> >> {'status': {'message': "Storage domain does not exist:
> >> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)", 'code': 358}}
> (dispatcher:78)
> >>
> >> _______________________________________________
> >> Devel mailing list
> >> Devel@ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/devel
> >
> >
> >
> > _______________________________________________
> > Devel mailing list
> > Devel@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
>
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to