On 04/04/2013 06:45 PM, Or Gerlitz wrote: > > Kleber , as for the 1st problem, which kernel consumers are hanging > for ever on their CQs? IPoIB is giving up after sometime e.g see in > ipoib_ib.c "assume the HW is wedged and just free up all our pending > work requests" >
Or, I don't have a very comprehensive testcase to stress most part of the IB stack during error recovery, but during my tests the kernel consumer that are still hanging is the ib_sa module, mcast_remove_one() is waiting for the port completion queue: INFO: task eehd:4689 blocked for more than 30 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. eehd D 0000000000000000 0 4689 2 0x00010000 Call Trace: [c0000000fba83190] [0000000000000001] 0x1 (unreliable) [c0000000fba83360] [c000000000016188] .__switch_to+0x140/0x268 [c0000000fba83410] [c000000000674f28] .__schedule+0x570/0x8f0 [c0000000fba836b0] [c000000000675bc4] .schedule_timeout+0x334/0x3c8 [c0000000fba837c0] [c000000000674738] .wait_for_common+0x1c0/0x238 [c0000000fba838a0] [d000000002ca230c] .mcast_remove_one+0xfc/0x168 [ib_sa] [c0000000fba83940] [d000000002bc4f60] .ib_unregister_device+0x78/0x170 [ib_core] ... Or rdma_cm waiting for the cma_dev completion: Call Trace: [c0000000f8fc70f0] [0000000000000001] 0x1 (unreliable) [c0000000f8fc72c0] [c000000000016188] .__switch_to+0x140/0x268 [c0000000f8fc7370] [c000000000674f28] .__schedule+0x570/0x8f0 [c0000000f8fc7610] [c000000000675bc4] .schedule_timeout+0x334/0x3c8 [c0000000f8fc7720] [c000000000674738] .wait_for_common+0x1c0/0x238 [c0000000f8fc7800] [d000000002f835b0] .cma_process_remove+0x170/0x1a8 [rdma_cm] [c0000000f8fc78b0] [d000000002f8366c] .cma_remove_one+0x84/0xb0 [rdma_cm] [c0000000f8fc7940] [d000000002c34f60] .ib_unregister_device+0x78/0x170 [ib_core] ... Thanks, kleber -- Kleber Sacilotto de Souza IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
