This morning I've had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don't the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I've been battling with this on and off now for a few months. I've upgraded my infiniband switch firmware, all the clients and servers are running the latest version of lustre and the lustre patched kernel. Any ideas?
-Aaron
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
