Hello,
I've run into a problem when using "qpid-route -t rdma route add" to setup an rdma federation link between 2 brokers. I've attached some simple code that replicates the problem by sending just 1 message.
Here's the setup. I have 4 nodes as follows: (publisher) -> (broker1) -> federation route -> (broker2) -> (consumer)Through trial and error I've found that when I send 1 message with payload size 7989 or greater, the (broker1) qpidd crashes with following error:
*******************qpidd: qpid/amqp_0_10/Connection.cpp:93: virtual size_t qpid::amqp_0_10::Connection::encode(const char*, size_t): Assertion `workQueue.empty() || workQueue.front().encodedSize() <= size' failed.
*******************This does not happen on rdma with message sizes 7888 or less. It does not happen with tcp at all.
Here is explanation of how to use attached code to (hopefully) replicate:I set my path and ld lib path env vars by running "source qpid-m5-env.bash."
0. Compile the 2 cpp files with make and attached Makefile. 1. Start qpidd on the (broker1) & (broker2) hosts using "start_qpidd.bash"2. Setup the route btw the broker hosts using "simple_fed_rdma.bash". I've also included "simple_fed_tcp.bash" that does it in tcp.
3. Start the consumer with "rdma_fed_bug_cons.exe". Use rdma or tcp protocol according to how you've setup the route in step 2.
$ ./rdma_fed_bug_cons.exe Usage: ./rdma_fed_bug_cons.exe [broker_ip_addr] [protocol (tcp|rdma)]4. Start the consumer with "rdma_fed_bug_pub.exe". Use rdma or tcp protocol according to how you've setup the route.
$ ./rdma_fed_bug_pub.exe Usage: ./rdma_fed_bug_pub.exe [broker_ip_addr] [msg_size] [protocol (tcp|rdma)]Again with rdma route and rdma protocol on clients, a msg_size 7989 or greater should crash.
My 4 hosts each have the following Mellanox Infiniband HCA with an assigned IPoIB interface address showing in "ifconfig". We are using OFED-1.3.1 drivers. The OS is Red Hat Enterprise Linux Server 5.
$ ibstat
CA 'mthca0'
CA type: MT25208
Number of ports: 2
Firmware version: 5.1.400
Hardware version: a0
Node GUID: 0x0002c9020023c300
System image GUID: 0x0002c9020023c303
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02510a68
Port GUID: 0x0002c9020023c301
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c9020023c302
Thanks for looking into this. Let me know if you have any problems
compiling/running my code.
Greg Marsh Network Based Computing Lab Ohio State University
rdma_fed_bug.tgz
Description: GNU Zip compressed data
--------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
