qpid-route -t rdma crashes C++ broker & client with lrg msgs
------------------------------------------------------------
Key: QPID-1855
URL: https://issues.apache.org/jira/browse/QPID-1855
Project: Qpid
Issue Type: Bug
Components: C++ Broker, C++ Client, python tools
Affects Versions: M4, 0.5
Environment: Red Hat Enterprise Linux Server 5. Mellanox Infiniband
MT25208 HCA. OFED-1.3.1 drivers.
Reporter: Gregory Marsh
I've run into a problem when using "qpid-route -t rdma route add" to setup an
rdma federation link between 2 brokers. I've attached some simple code that
replicates the problem by sending just 1 message.
Here's the setup. I have 4 nodes as follows:
(publisher) -> (broker1) -> federation route -> (broker2) -> (consumer)
Through trial and error I've found that when I send 1 message with payload size
7989 or greater, the (broker1) qpidd crashes with following error:
*******************
qpidd: qpid/amqp_0_10/Connection.cpp:93: virtual size_t
qpid::amqp_0_10::Connection::encode(const char*, size_t): Assertion
`workQueue.empty() || workQueue.front().encodedSize() <= size' failed.
*******************
This does not happen on rdma with message sizes 7888 or less. It does not
happen with tcp at all.
Here is explanation of how to use attached code to (hopefully) replicate:
I set my path and ld lib path env vars by running "source qpid-m5-env.bash."
0. Compile the 2 cpp files with make and attached Makefile.
1. Start qpidd on the (broker1) & (broker2) hosts using "start_qpidd.bash"
2. Setup the route btw the broker hosts using "simple_fed_rdma.bash". I've
also included "simple_fed_tcp.bash" that does it in tcp.
3. Start the consumer with "rdma_fed_bug_cons.exe". Use rdma or tcp protocol
according to how you've setup the route in step 2.
$ ./rdma_fed_bug_cons.exe
Usage: ./rdma_fed_bug_cons.exe
[broker_ip_addr]
[protocol (tcp|rdma)]
4. Start the consumer with "rdma_fed_bug_pub.exe". Use rdma or tcp protocol
according to how you've setup the route.
$ ./rdma_fed_bug_pub.exe
Usage: ./rdma_fed_bug_pub.exe
[broker_ip_addr]
[msg_size]
[protocol (tcp|rdma)]
Again with rdma route and rdma protocol on clients, a msg_size 7989 or greater
should crash.
My 4 hosts each have the following Mellanox Infiniband HCA with an assigned
IPoIB interface address showing in "ifconfig". We are using OFED-1.3.1 drivers:
$ ibstat
CA 'mthca0'
CA type: MT25208
Number of ports: 2
Firmware version: 5.1.400
Hardware version: a0
Node GUID: 0x0002c9020023c300
System image GUID: 0x0002c9020023c303
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02510a68
Port GUID: 0x0002c9020023c301
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c9020023c302
Thanks for looking into this. Let me know if you have any problems
compiling/running my code.
Greg Marsh
Network Based Computing Lab
Ohio State University
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]