Hi,

We are testing 1.6b5 for a InfiniBand cluster with RHEL 4. We use the binaries provides by CFS and use OFED 1.1 as the IB stack.

At several times some of the clients hang during fs mount or when an OST is added (see log).
Error:
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] rejected: reason 8, size 148

from OFED:
enum ib_cm_rej_reason {
      IB_CM_REJ_INVALID_SERVICE_ID            = 8,

Once an IPoIB ping is started to the corresponding OST the client continues. Afterwards it is quite stable.

Any idea how this could be fixed?

Thanks,
Mirko

Lustre:   mount data:
Lustre: profile: testfs-client
Lustre: device:  [EMAIL PROTECTED]:/testfs
Lustre: flags:   2
Lustre:   0 UP mgc [EMAIL PROTECTED] 438411f9-d2cc-f576-9a5d-bc927badfa60 5
Lustre:   1 UP lov testfs-clilov-0000010075688000 
7255c262-21e0-f804-91dd-2e8008cc166a 3
Lustre:   2 UP mdc testfs-MDT0000-mdc-0000010075688000 
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre:   3 UP osc testfs-OST0000-osc-0000010075688000 
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre:   4 UP osc testfs-OST0001-osc-0000010075688000 
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre: mount [EMAIL PROTECTED]:/testfs complete
Lustre: client 0000010075688000 umount complete
Lustre:   mount data:
Lustre: profile: testfs-client
Lustre: device:  [EMAIL PROTECTED]:/testfs
Lustre: flags:   2
Lustre:   0 UP mgc [EMAIL PROTECTED] caf868ce-f8dc-8c83-ecd4-caf4a75378f2 5
Lustre:   1 UP lov testfs-clilov-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 3
Lustre:   2 UP mdc testfs-MDT0000-mdc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   3 UP osc testfs-OST0000-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   4 UP osc testfs-OST0001-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   5 UP osc testfs-OST0002-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   6 UP osc testfs-OST0003-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   7 UP osc testfs-OST0004-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   8 UP osc testfs-OST0005-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:   9 UP osc testfs-OST0006-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre:  10 UP osc testfs-OST0007-osc-000001007eaba800 
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: mount [EMAIL PROTECTED]:/testfs complete
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 1 previous 
similar message
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521780, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1 
previous similar message
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521805, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 5 previous 
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521830, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 1 previous 
similar message
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521855, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous 
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521880, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous 
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521905, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 17 previous 
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521930, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 1 previous 
similar message
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521955, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous 
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166521980, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous 
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166522005, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from [EMAIL 
PROTECTED] failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous 
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) [EMAIL PROTECTED] 
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting 
messages for [EMAIL PROTECTED]: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous 
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout 
(sent at 1166522030, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3 
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]              2  
  up     8     8     8     8     6 0
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to