[
https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298201#comment-14298201
]
michael j. goulish commented on DISPATCH-106:
---------------------------------------------
OK, will do. very interesting -- thanks !
> pn link corruption after router restart
> ---------------------------------------
>
> Key: DISPATCH-106
> URL: https://issues.apache.org/jira/browse/DISPATCH-106
> Project: Qpid Dispatch
> Issue Type: Bug
> Components: Router Node
> Affects Versions: 0.3
> Reporter: michael goulish
> Fix For: 0.4
>
>
> With the standard 6-node demo network, (A-D, X, Y) after killing and
> restarting node Y, I see a bad link on router D -- which causes D to crash.
> Here is sequence of events from logs of routers and the topologist testing
> program:
> 01:05:05.367 Killing router Y, pid 20074
> 01:05:05.367 Sleeping 30 seconds
> 01:05:35.367 Restarting router Y, pid 20120
> 01:05:38 Router D : last "valid origins" post to its log file :
> Node QDR.C valid origins: []
> 01:05:46 Router D posts to its log file:
> Exited Router Flux Mode
> 01:06:05.368 checking for crash after node bounce
> ( no crash detected )
> 01:06:17 last post to router D log file
> ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872
> ls_seq=2 mobile_seq=0)
> 01:06:35.369 second check for crash. (none detected)
> 01:06:35.370 getting topology
> ( Node D fails to respond. PID 20072 )
> ( core file, timestamped 01:06 )
> here is backtrace from router D's core file
> {
> #0 pn_string_get (string=0xfdfdfdfdbabecafe) at
> /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
> #1 0x00007ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at
> /home/mick/dispatch/src/router_agent.c:112
> #2 0x00007ff73fa8e7dd in qd_entity_refresh_router_link
> (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
> at /home/mick/dispatch/src/router_agent.c:120
> #3 0x0000003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
> #4 0x0000003e408056bc in ffi_call () from /lib64/libffi.so.6
> #5 0x00007ff737d2dc8b in _ctypes_callproc () from
> /usr/lib64/python2.7/lib-dynload/_ctypes.so
> #6 0x00007ff737d27a85 in PyCFuncPtr_call () from
> /usr/lib64/python2.7/lib-dynload/_ctypes.so
> #7 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
> #8 0x00000036df4de37c in PyEval_EvalFrameEx () from
> /lib64/libpython2.7.so.1.0
> #9 0x00000036df4e21dd in PyEval_EvalCodeEx () from
> /lib64/libpython2.7.so.1.0
> #10 0x00000036df4e088f in PyEval_EvalFrameEx () from
> /lib64/libpython2.7.so.1.0
> #11 0x00000036df4e21dd in PyEval_EvalCodeEx () from
> /lib64/libpython2.7.so.1.0
> #12 0x00000036df4e088f in PyEval_EvalFrameEx () from
> /lib64/libpython2.7.so.1.0
> #13 0x00000036df4e21dd in PyEval_EvalCodeEx () from
> /lib64/libpython2.7.so.1.0
> #14 0x00000036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
> #15 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
> #16 0x00000036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
> #17 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
> #18 0x00000036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
> #19 0x00000036df44a29e in PyObject_CallFunction () from
> /lib64/libpython2.7.so.1.0
> #20 0x00007ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68,
> msg=0x7ff728019bd0, link_id=0
> at /home/mick/dispatch/src/python_embedded.c:519
> #21 0x00007ff73fa92533 in router_rx_handler (context=0x1db5fd0,
> link=0x7ff730008710, delivery=0x7ff73004cc50)
> at /home/mick/dispatch/src/router_node.c:922
> #22 0x00007ff73fa7fa16 in do_receive (pnd=0x1e359a0) at
> /home/mick/dispatch/src/container.c:221
> #23 0x00007ff73fa7fea3 in process_handler (container=0x1dbd6f0,
> unused=0x1e0a050, qd_conn=0x1e2c6a0)
> at /home/mick/dispatch/src/container.c:362
> #24 0x00007ff73fa80135 in handler (handler_context=0x1dbd6f0,
> conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
> qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438
> #25 0x00007ff73fa98346 in process_connector (qd_server=0x1d78460,
> cxtr=0x1e1b9b0)
> at /home/mick/dispatch/src/server.c:322
> #26 0x00007ff73fa98c1f in thread_run (arg=0x1d70d30) at
> /home/mick/dispatch/src/server.c:546
> #27 0x0000003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
> ...
> }
> Let's go up to qd_router_link_name
> at /home/mick/dispatch/src/router_agent.c:112
> (gdb) print * link
> $1 =
> {
> prev = 0x7ff72800b210,
> next = 0x7ff72800b390,
> mask_bit = 3,
> link_type = QD_LINK_ROUTER,
> link_direction = QD_OUTGOING,
> owning_addr = 0x1d7d6c0,
> waypoint = 0x0,
> link = 0x7ff7280099d0,
> connected_link = 0x0,
> ref = 0x7ff72800f350,
> target = 0x0,
> event_fifo =
> {
> head = 0x0,
> tail = 0x0,
> scratch = 0x0,
> size = 0
> },
> msg_fifo =
> {
> head = 0x7ff73003c230,
> tail = 0x7ff73003bb70,
> scratch = 0x7ff73003b9f0,
> size = 102
> }
> }
> (gdb) print * (link->link)
> $2 =
> {
> pn_sess = 0x7ff72804b7b0,
> pn_link = 0x7ff72804d6a0,
> context = 0x7ff72800b2d0,
> node = 0x1db6bb0,
> drain_mode = false
> }
> (gdb) print * (link->link->pn_link)
> $3 = {
> endpoint = {
> type = 33686018,
> state = 33686018,
> error = 0x202020202020202,
> condition = {
> name = 0x202020202020202,
> description = 0x202020202020202,
> info = 0x202020202020202
> },
> remote_condition = {
> name = 0x202020202020202,
> description = 0x202020202020202,
> info = 0x202020202020202
> },
> endpoint_next = 0x202020202020202,
> endpoint_prev = 0x202020202020202,
> transport_next = 0x202020202020202,
> transport_prev = 0x202020202020202,
> modified = 2,
> freed = 2,
> posted_final = 2
> },
> source = {
> address = 0x202020202020202,
> properties = 0x202020202020202,
> capabilities = 0x202020202020202,
> outcomes = 0x202020202020202,
> filter = 0x202020202020202,
> durability = (PN_DELIVERIES | unknown: 33686016),
> expiry_policy = 33686018,
> timeout = 33686018,
> type = 33686018,
> distribution_mode = (PN_DIST_MODE_MOVE | unknown: 33686016),
> dynamic = 2
> },
> target = {
> address = 0x202020202020202,
> properties = 0x202020202020202,
> capabilities = 0x202020202020202,
> outcomes = 0x202020202020202,
> filter = 0x202020202020202,
> durability = (PN_DELIVERIES | unknown: 33686016),
> expiry_policy = 33686018,
> ( etc. -- it's all garbage. )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]