On 03/02/2013 07:14, Bharath Ramesh wrote:
Intermittently a couple of nodes in our cluster throw the error "Failed to obtain HW semaphore, aborting" on boot. When this error occurs we are unable to use IB on those nodes, unloading and reloading the module doesnt help.

load mlx4_core with debug_level=1 and send the resulted dmesg along with the lspci info of the card ("$ lspci | grep Mellanox")


I was wondering what could be causing this error, google only brings up the source code and no discussion about this error. We are using OFED-1.5.4, any help in debugging and resolving this issue is greatly appreciated.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to