[ https://issues.apache.org/jira/browse/DISPATCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcel Meulemans updated DISPATCH-1019: --------------------------------------- Description: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 10000 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 +0000 ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 +0000 ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 +0000 ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{ 2018-05-29 14:31:05.146130 +0000 ROUTER (error) Exception in control message processing}} {{ Traceback (most recent call last):}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{self.mobile_address_engine.handle_mau(msg, now)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{node = self.node_tracker.router_node(msg.id)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{return self.nodes[node_id]}} {{ KeyError: None}} {{ 2018-05-29 14:31:05.146175 +0000 ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails" to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur. was: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 10000 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 +0000 ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 +0000 ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 +0000 ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{2018-05-29 14:31:05.146130 +0000 ROUTER (error) Exception in control message processing}} {{Traceback (most recent call last):}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{ self.mobile_address_engine.handle_mau(msg, now)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{ node = self.node_tracker.router_node(msg.id)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{ return self.nodes[node_id]}} {{KeyError: None}} {{2018-05-29 14:31:05.146175 +0000 ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails" to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur. > Messaging instability in networks with many clients / addresses. > ---------------------------------------------------------------- > > Key: DISPATCH-1019 > URL: https://issues.apache.org/jira/browse/DISPATCH-1019 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine > Affects Versions: 1.1.0 > Reporter: Marcel Meulemans > Priority: Major > > After DISPATCH-966 has been fixed I am still experiencing problems in a > network with many clients / addresses. I am running a three node fully > connected mesh of dispatch routers with 10000 attached clients all with two > address messaging at around 100 msg/sec. > In the logs I am seeing the following errors: > {{2018-05-29 14:31:05.145732 +0000 ERROR (error) Invalid message: > Insufficient Data to Determine Tag}} > {{2018-05-29 14:31:05.145748 +0000 ERROR (error) Invalid message: Can't > convert message field body}} > Which, in turn, lead to python errors like: > {{2018-05-29 14:31:05.145971 +0000 ROUTER_MA (trace) RCVD: MAU(id=None pv=1 > area=0 mobile_seq=0)}} > {{ 2018-05-29 14:31:05.146130 +0000 ROUTER (error) Exception in control > message processing}} > {{ Traceback (most recent call last):}} > {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line > 157, in handleControlMessage}} > {{self.mobile_address_engine.handle_mau(msg, now)}} > {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, > in handle_mau}} > {{node = self.node_tracker.router_node(msg.id)}} > {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, > in router_node}} > {{return self.nodes[node_id]}} > {{ KeyError: None}} > {{ 2018-05-29 14:31:05.146175 +0000 ROUTER (error) Control message error: > opcode=MAU body=None}} > I have tracked down the cause of the "Insufficient Data to Determine Tag" > message to the following: During the call to {{qd_parse}} of the {{MAU}} > message the {{qd_iterator_t}} reaches the end of the buffer list before it > should. Specifically the call to {{qd_iterator_advance}} here > [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] > "fails" to move forward a certain number of bytes (e.g. 31) even though the > {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. > 84802). The "fail" is because we reach the end of the buffer list before we > should (here > [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). > What I have not been able to figure out yet is why this happens because it is > not consistent. Many large MAU message are parsed correctly only sometimes > not. I am able to reproduce these errors every time I run my tests. There may > be a time component involved because the more logging I to the router code, > the less often the errors seem to occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org