michael goulish created DISPATCH-106:
----------------------------------------
Summary: pn link corruption after router restart
Key: DISPATCH-106
URL: https://issues.apache.org/jira/browse/DISPATCH-106
Project: Qpid Dispatch
Issue Type: Bug
Components: Router Node
Affects Versions: 0.4
Reporter: michael goulish
With the standard 6-node demo network, (A-D, X, Y) after killing and
restarting node Y, I see a bad link on router D -- which causes D to crash.
Here is sequence of events from logs of routers and the topologist testing
program:
01:05:05.367 Killing router Y, pid 20074
01:05:05.367 Sleeping 30 seconds
01:05:35.367 Restarting router Y, pid 20120
01:05:38 Router D : last "valid origins" post to its log file :
Node QDR.C valid origins: []
01:05:46 Router D posts to its log file:
Exited Router Flux Mode
01:06:05.368 checking for crash after node bounce
( no crash detected )
01:06:17 last post to router D log file
ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872
ls_seq=2 mobile_seq=0)
01:06:35.369 second check for crash. (none detected)
01:06:35.370 getting topology
( Node D fails to respond. PID 20072 )
( core file, timestamped 01:06 )
here is backtrace from router D's core file
{
#0 pn_string_get (string=0xfdfdfdfdbabecafe) at
/home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
#1 0x00007ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at
/home/mick/dispatch/src/router_agent.c:112
#2 0x00007ff73fa8e7dd in qd_entity_refresh_router_link
(entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
at /home/mick/dispatch/src/router_agent.c:120
#3 0x0000003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
#4 0x0000003e408056bc in ffi_call () from /lib64/libffi.so.6
#5 0x00007ff737d2dc8b in _ctypes_callproc () from
/usr/lib64/python2.7/lib-dynload/_ctypes.so
#6 0x00007ff737d27a85 in PyCFuncPtr_call () from
/usr/lib64/python2.7/lib-dynload/_ctypes.so
#7 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#8 0x00000036df4de37c in PyEval_EvalFrameEx () from
/lib64/libpython2.7.so.1.0
#9 0x00000036df4e21dd in PyEval_EvalCodeEx () from
/lib64/libpython2.7.so.1.0
#10 0x00000036df4e088f in PyEval_EvalFrameEx () from
/lib64/libpython2.7.so.1.0
#11 0x00000036df4e21dd in PyEval_EvalCodeEx () from
/lib64/libpython2.7.so.1.0
#12 0x00000036df4e088f in PyEval_EvalFrameEx () from
/lib64/libpython2.7.so.1.0
#13 0x00000036df4e21dd in PyEval_EvalCodeEx () from
/lib64/libpython2.7.so.1.0
#14 0x00000036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
#15 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x00000036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
#17 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#18 0x00000036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
#19 0x00000036df44a29e in PyObject_CallFunction () from
/lib64/libpython2.7.so.1.0
#20 0x00007ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68,
msg=0x7ff728019bd0, link_id=0)
at /home/mick/dispatch/src/python_embedded.c:519
#21 0x00007ff73fa92533 in router_rx_handler (context=0x1db5fd0,
link=0x7ff730008710, delivery=0x7ff73004cc50)
at /home/mick/dispatch/src/router_node.c:922
#22 0x00007ff73fa7fa16 in do_receive (pnd=0x1e359a0) at
/home/mick/dispatch/src/container.c:221
#23 0x00007ff73fa7fea3 in process_handler (container=0x1dbd6f0,
unused=0x1e0a050, qd_conn=0x1e2c6a0)
at /home/mick/dispatch/src/container.c:362
#24 0x00007ff73fa80135 in handler (handler_context=0x1dbd6f0,
conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438
#25 0x00007ff73fa98346 in process_connector (qd_server=0x1d78460,
cxtr=0x1e1b9b0)
at /home/mick/dispatch/src/server.c:322
#26 0x00007ff73fa98c1f in thread_run (arg=0x1d70d30) at
/home/mick/dispatch/src/server.c:546
#27 0x0000003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
...
}
Let's go up to qd_router_link_name
at /home/mick/dispatch/src/router_agent.c:112
(gdb) print * link
$1 =
{
prev = 0x7ff72800b210,
next = 0x7ff72800b390,
mask_bit = 3,
link_type = QD_LINK_ROUTER,
link_direction = QD_OUTGOING,
owning_addr = 0x1d7d6c0,
waypoint = 0x0,
link = 0x7ff7280099d0,
connected_link = 0x0,
ref = 0x7ff72800f350,
target = 0x0,
event_fifo =
{
head = 0x0,
tail = 0x0,
scratch = 0x0,
size = 0
},
msg_fifo =
{
head = 0x7ff73003c230,
tail = 0x7ff73003bb70,
scratch = 0x7ff73003b9f0,
size = 102
}
}
(gdb) print * (link->link)
$2 =
{
pn_sess = 0x7ff72804b7b0,
pn_link = 0x7ff72804d6a0,
context = 0x7ff72800b2d0,
node = 0x1db6bb0,
drain_mode = false
}
(gdb) print * (link->link->pn_link)
$3 = {
endpoint = {
type = 33686018,
state = 33686018,
error = 0x202020202020202,
condition = {
name = 0x202020202020202,
description = 0x202020202020202,
info = 0x202020202020202
},
remote_condition = {
name = 0x202020202020202,
description = 0x202020202020202,
info = 0x202020202020202
},
endpoint_next = 0x202020202020202,
endpoint_prev = 0x202020202020202,
transport_next = 0x202020202020202,
transport_prev = 0x202020202020202,
modified = 2,
freed = 2,
posted_final = 2
},
source = {
address = 0x202020202020202,
properties = 0x202020202020202,
capabilities = 0x202020202020202,
outcomes = 0x202020202020202,
filter = 0x202020202020202,
durability = (PN_DELIVERIES | unknown: 33686016),
expiry_policy = 33686018,
timeout = 33686018,
type = 33686018,
distribution_mode = (PN_DIST_MODE_MOVE | unknown: 33686016),
dynamic = 2
},
target = {
address = 0x202020202020202,
properties = 0x202020202020202,
capabilities = 0x202020202020202,
outcomes = 0x202020202020202,
filter = 0x202020202020202,
durability = (PN_DELIVERIES | unknown: 33686016),
expiry_policy = 33686018,
( etc. -- it's all garbage. )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]