Hi,
is there anybody who can read these messages and give me a hint where to
look for the problem? I'm getting rather easilly this LBUG due to either
(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed
or
(o2iblnd_cb.c:171:kiblnd_get_idle_tx()) ASSERTION(tx->tx_sending == 0) failed
Using lustre 1.6.1 as downloaded, on top of RHEL4U5, with o2ib and getting
this a few times per day while writing huge files with "dd".
Any hint (where to look into this further) would be very welcome! Some more
surroundings of the error message are below.
Best regards,
Erich
Lustre: necd3-OST0000-osc-0000010080cd5800: Connection restored to service
necd3-OST0000 using nid [EMAIL PROTECTED]
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 00000100156ba000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 000001001f1f4000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 00000100a10b0000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 000001002e3fa000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 0000010066604000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 0000010070022000
LustreError: 6820:0:(events.c:55:request_out_callback()) @@@ type 4, status -5
[EMAIL PROTECTED] x1000806/t0 o400->[EMAIL PROTECTED]@o2ib_0:26 lens 128/128
ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 6820:0:(events.c:55:request_out_callback()) Skipped 6 previous
similar messages
LustreError: 6820:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 6819:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 6819:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Lustre: 6819:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
process 6819
kiblnd_sd_00 R running task 0 6819 1 6824 6820 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
ffffff000006c5a0 0000000000000000 0000000000000005 ffffffffa0288894
0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
<ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
<0>LustreError: 6820:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
kiblnd_sd_01 R running task 0 6820 1 6819 6821 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
ffffff000006c6d0 0000000000000000 0000000000000005 ffffffffa0288894
<ffffffff80133741>{__wake_up+54}
<ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}
0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
<ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
<ffffffff8013369a>{default_wake_function+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80133741>{__wake_up+54}<ffffffff80110ddb>{child_rip+0}
<3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) @@@ network
error (sent at 1187792190, 0s ago) [EMAIL PROTECTED] x1000806/t0 o400->[EMAIL
PROTECTED]@o2ib_0:26 lens 128/128 ref 1 fl Rpc:N/0/0 rc 0/-22
<ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}
<3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request())
Skipped 8 previous similar messages
LustreError: 166-1: [EMAIL PROTECTED]: Connection to service MGS via nid [EMAIL
PROTECTED] was lost; in progress operations using this service will fail.
<ffffffff8013369a>{default_wake_function+0} <1>LustreError: dumping log to
/tmp/lustre-log.1187792190.6819
<ffffffff80110de3>{child_rip+8}
<ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80110ddb>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1187792190.6820
LustreError: 2697:0:(events.c:55:request_out_callback()) @@@ type 4, status
-113 [EMAIL PROTECTED] x1000808/t0 o400->[EMAIL PROTECTED]@o2ib:28 lens
128/128 ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 2697:0:(events.c:55:request_out_callback()) Skipped 1 previous
similar message
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA
with [EMAIL PROTECTED]
Lustre: necd3-OST0000-osc-0000010080cd5800: Connection to service necd3-OST0000
via nid [EMAIL PROTECTED] was lost; in progress operations using this service
will wait for recovery to complete.
Lustre: Skipped 3 previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000867/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA
with [EMAIL PROTECTED]
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA
with [EMAIL PROTECTED]
LustreError: 6823:0:(events.c:55:request_out_callback()) @@@ type 4, status
-103 [EMAIL PROTECTED] x1000850/t0 o400->[EMAIL PROTECTED]@o2ib:28 lens
128/128 ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 6823:0:(events.c:55:request_out_callback()) Skipped 1 previous
similar message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000871/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout
(sent at 1187792290, 100s ago) [EMAIL PROTECTED] x1000856/t0 o250->[EMAIL
PROTECTED]@o2ib_0:26 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 26
previous similar messages
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA
with [EMAIL PROTECTED]
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Skipped 1 previous
similar message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000886/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000890/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000905/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000913/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000924/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout
(sent at 1187792641, 100s ago) [EMAIL PROTECTED] x1000917/t0 o38->[EMAIL
PROTECTED]@o2ib:12 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 63
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
[EMAIL PROTECTED] x1000939/t0 o101->[EMAIL PROTECTED]@o2ib_0:26 lens 232/240
ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3
previous similar messages
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss