Hi,

I was testing lustre 1.6 beta (5) with standard linux 2.6.9-42 kernel on
x86_64 (CentOS 4) installation with OFED-1.1 (IB) as the interconnect.
The setup is such that 8 nodes are sharing one OST (80G) and a separate
node shares two OST's and is also acting as the MGS/MDS. 

The 8 Nodes also mount the exported file system. When i try to run
iozone on the lustre file system from the head node (that is only a
client) and visit that directory from any other node there are crashes.
The logs are attached below.

Could somebody tell me what might be wrong. I understand that  this is
not a recommended configuration but I just wanted to check the
scalability.

Thanks

Anand
 

##########HEAD NODE (CLIENT ONLY)###################################
Lustre: OBD class driver, [EMAIL PROTECTED]
        Lustre Version: 1.5.95
        Build Version:
1.5.95-19691231160000-PRISTINE-.usr.src.linux-2.6.9-42.EL_lustre.1.5.95smp
Lustre: Added LNI [EMAIL PROTECTED] [8/64]
Lustre: Lustre Client File System; [EMAIL PROTECTED]
Lustre:   mount data:
Lustre: profile: lfs-client
Lustre: device:  [EMAIL PROTECTED]:/lfs
Lustre: flags:   2
Lustre:   0 UP mgc [EMAIL PROTECTED]
59df7475-a854-b1bc-abc4-1ad9b947113a 5
Lustre:   1 UP lov lfs-clilov-000001007e2eb400
062c1341-76b7-b41d-90be-68497e189830 3
Lustre:   2 UP mdc lfs-MDT0000-mdc-000001007e2eb400
062c1341-76b7-b41d-90be-68497e189830 4
Lustre:   3 UP osc lfs-OST0000-osc-000001007e2eb400
062c1341-76b7-b41d-90be-68497e189830 4
Lustre:   4 UP osc lfs-OST0001-osc-000001007e2eb400
062c1341-76b7-b41d-90be-68497e189830 4
Lustre: mount [EMAIL PROTECTED]:/lfs complete
Losing some ticks... checking if CPU frequency changed.
LustreError: 7476:0:(lib-move.c:93:lnet_try_match_md()) Matching packet
from [EMAIL PROTECTED], match 365275 length 808 too big: 560 left, 5
Lustre: 7476:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from
[EMAIL PROTECTED] portal 10 match 365275 offset 0 length 808: 2
LustreError: 19139:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162941625, 100s ago)
Lustre: 19139:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
2    up     8     8     8     8    -8 0
LustreError: lfs-MDT0000-mdc-000001007e2eb400: Connection to service
lfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using
Lustre: lfs-MDT0000-mdc-000001007e2eb400: Connection restored to service
lfs-MDT0000 using nid [EMAIL PROTECTED]
ib_mthca 0000:02:00.0: SQ c9040a full (485434 head, 483377 tail, 2064
max, 7 nreq)
LustreError: 7477:0:(o2iblnd_cb.c:976:kiblnd_check_sends()) Error -12
posting transmit to [EMAIL PROTECTED]
LustreError: 7477:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -12(waiting)
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -12, desc 000001002d302000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 0000010101dca000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 000001002e270000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 0000010085b7a000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 000001007c8ca000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 0000010106992000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 000001009039a000
LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
status -103, desc 0000010143248000
LustreError: 7507:0:(client.c:904:ptlrpc_check_set()) @@@ bulk transfer
failed
LustreError: 7507:0:(linux-debug.c:130:lbug_with_loc()) LBUG
Lustre: 7507:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
stack for process 7507
ptlrpcd       S 00000100784b9800     0  7507      1          7508  7480
(L-TLB)
0000000300000000 00000100410739a8 0000000000000000 0000000000000001
       0000000000000246 0000000000000003 000001012969c000
0000000000000004
       0000000000000000 ffffffffa03b7630
Call Trace:<ffffffff80148773>{__kernel_text_address+26}
<ffffffff80111600>{show_trace+375}
       <ffffffff8011173c>{show_stack+241}
<ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
       <ffffffffa03b978d>{:ptlrpc:ptlrpc_check_set+1782}
<ffffffffa03d8a47>{:ptlrpc:ptlrpcd_check+279}
       <ffffffffa03d8d36>{:ptlrpc:ptlrpcd+533}
<ffffffff801331a9>{default_wake_function+0}
       <ffffffffa03b850b>{:ptlrpc:ptlrpc_expired_set+0}
<ffffffffa03b850b>{:ptlrpc:ptlrpc_expired_set+0}
       <ffffffff801331a9>{default_wake_function+0}
<ffffffff80131f2f>{schedule_tail+55}
       <ffffffff80110e23>{child_rip+8}
<ffffffffa03d8b21>{:ptlrpc:ptlrpcd+0}
       <ffffffff80110e1b>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1162941923.7507
Lustre: 7507:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked LNET
upcall /usr/lib/lustre/lnet_upcall
LBUG,/usr/src/redhat/BUILD/lustre-1.5.95/
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-MDT0000_UUID INACTIVE by administrator request
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007ee1e458 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007e6fa8c0 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 70
previous similar messages
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007f34a370 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 273
previous similar messages
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0001_UUID INACTIVE by administrator request
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 1
previous similar message
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007f8c3330 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 383
previous similar messages
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0002_UUID INACTIVE by administrator request
LustreError: 19691:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
similar message
LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
similar messages
LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
similar messages
LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 19693:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
fails: rc = -5
LustreError: 19717:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
fails: rc = -5
LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007fae07f0 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 294
previous similar messages
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0003_UUID INACTIVE by administrator request
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
of page 000001007f45b488 failed: -5
LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped
1386 previous similar messages
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0005_UUID INACTIVE by administrator request
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 1
previous similar message
LustreError: 19814:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19814:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
similar messages
LustreError: 19814:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 19812:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 19812:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683700
LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0006_UUID INACTIVE by administrator request
LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
LustreError: 20058:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 20058:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 21 previous
similar messages
LustreError: 20058:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 20058:0:(file.c:2215:ll_inode_revalidate_fini()) Skipped 21
previous similar messages
LustreError: 20059:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 20059:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 20060:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
fails: rc = -5
LustreError: 18111:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 18111:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 18111:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 18111:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-MDT0000_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
previous similar messages
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0003_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 3
previous similar messages
LustreError: 7478:0:(o2iblnd_cb.c:1022:kiblnd_tx_complete()) tx ->
[EMAIL PROTECTED] type d0 cookie 0xea03dsending 1 waiting 0: failed 12
LustreError: 7478:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -5(waiting)
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0006_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
previous similar messages
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-MDT0000_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
previous similar messages
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0003_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 3
previous similar messages
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
import lfs-OST0006_UUID INACTIVE by administrator request
Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
previous similar messages
LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
LustreError: 20863:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 20863:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21239:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21239:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21240:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21240:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21344:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21344:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21345:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21345:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21346:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21347:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21347:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
similar message
LustreError: 21418:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21420:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
fails: rc = -5
LustreError: 21422:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
-5
LustreError: 21422:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
similar message
LustreError: 21422:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
inode 2683697
LustreError: 21422:0:(file.c:2215:ll_inode_revalidate_fini()) Skipped 1
previous similar message


##########################OST Node (one of them)#######################
LDISKFS FS on sdb, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: OBD class driver, [EMAIL PROTECTED]
        Lustre Version: 1.5.95
        Build Version:
1.5.95-19691231160000-PRISTINE-.usr.src.linux-2.6.9-42.EL_lustre.1.5.95smp
Lustre: Added LNI [EMAIL PROTECTED] [8/64]
Lustre: Lustre Client File System; [EMAIL PROTECTED]
Lustre:   mount data:
Lustre: device:  /dev/sdb
Lustre: flags:   0
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on sdb, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre:   disk data:
Lustre: server:  lfs-OSTffff
Lustre: uuid:
Lustre: fs:      lfs
Lustre: index:   ffff
Lustre: config:  1
Lustre: flags:   0x72
Lustre: diskfs:  ldiskfs
Lustre: options: errors=remount-ro,extents,mballoc
Lustre: params:   [EMAIL PROTECTED]
Lustre: comment:
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on sdb, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre:   disk data:
Lustre: server:  lfs-OST0002
Lustre: uuid:
Lustre: fs:      lfs
Lustre: index:   0002
Lustre: config:  2
Lustre: flags:   0x2
Lustre: diskfs:  ldiskfs
Lustre: options: errors=remount-ro,extents,mballoc
Lustre: params:   [EMAIL PROTECTED]
Lustre: comment:
Lustre: Filtering OBD driver; [EMAIL PROTECTED]
Lustre: lfs-OST0002: new disk, initializing
Lustre: OST lfs-OST0002 now serving dev
(lfs-OST0002/d4578f7c-974c-4317-9537-a5d96bfe529c) with recovery enabled
Lustre:   0 UP mgc [EMAIL PROTECTED]
c6fccd10-45a7-90ea-8d4b-7411ba7bd65f 6
Lustre:   1 UP ost OSS OSS_uuid 3
Lustre:   2 UP obdfilter lfs-OST0002 lfs-OST0002_UUID 3
Lustre: mount /dev/sdb complete
Lustre: lfs-OST0002: received MDS connection from [EMAIL PROTECTED]
Lustre:   mount data:
Lustre: profile: lfs-client
Lustre: device:  [EMAIL PROTECTED]:/lfs
Lustre: flags:   2
Lustre:   0 UP mgc [EMAIL PROTECTED]
c6fccd10-45a7-90ea-8d4b-7411ba7bd65f 5
Lustre:   1 UP ost OSS OSS_uuid 3
Lustre:   2 UP obdfilter lfs-OST0002 lfs-OST0002_UUID 9
Lustre:   3 UP lov lfs-clilov-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 3
Lustre:   4 UP mdc lfs-MDT0000-mdc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:   5 UP osc lfs-OST0000-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:   6 UP osc lfs-OST0001-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:   7 UP osc lfs-OST0002-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:   8 UP osc lfs-OST0003-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:   9 UP osc lfs-OST0004-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:  10 UP osc lfs-OST0005-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:  11 UP osc lfs-OST0006-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre:  12 UP osc lfs-OST0007-osc-000001007103b400
45081e76-b115-3f24-80c8-ab1470284e1f 4
Lustre: mount [EMAIL PROTECTED]:/lfs complete
Lustre: lfs-OST0002: haven't heard from client
062c1341-76b7-b41d-90be-68497e189830 (at [EMAIL PROTECTED]) in 237 seconds.
I think it's dead, and I am evicting it.
ib_mthca 0000:02:00.0: SQ 00040a full (198882 head, 196825 tail, 2064
max, 7 nreq)
LustreError: 5937:0:(o2iblnd_cb.c:976:kiblnd_check_sends()) Error -12
posting transmit to [EMAIL PROTECTED]
LustreError: 5937:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -12(waiting)
LustreError: 5937:0:(events.c:127:client_bulk_callback()) event type 0,
status -12, desc 000001005e632000
LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
[EMAIL PROTECTED] failed: 5
LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
[EMAIL PROTECTED] failed: 5
LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 29
previous similar messages
LustreError: 5936:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001003b480000
LustreError: 5936:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 0000010135818000
LustreError: 5938:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001000ee8a000
LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
[EMAIL PROTECTED] failed: 5
LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 688
previous similar messages
LustreError: 5936:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 5938:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001000bfba000
LustreError: 5937:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 5937:0:(linux-debug.c:130:lbug_with_loc()) LBUG
LustreError: 5938:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
Lustre: 5937:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
stack for process 5937
kiblnd_sd_02  R  running task       0  5937      1          5938  5936
(L-TLB)
kiblnd_sd_03  00000100080017e0 0000000000000000 0000000000000001
0000000000000216
       0000000000000012 0000000000000001 ffffffffa0396f28
ffffffff80111731
       000001007c049dd8 000000000000019c R  running task       0  5938
1          5939
Call Trace:  5937 (L-TLB)
00000100080017e0 0000000000000000 0000000000000001 0000000000000216
       0000000000000012 0000000000000001 ffffffffa0396f28
ffffffff80111731
       000001014a6bddd8 000000000000019c
Call Trace:<ffffffff80111600>{show_trace+375}
<ffffffff80111600>{show_trace+375} <0>LustreError:
5936:0:(linux-debug.c:130:lbug_with_loc()) LBUG
LustreError: 5936:0:(linux-debug.c:130:lbug_with_loc()) Skipped 1
previous similar message
Lustre: 5936:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
stack for process 5936
Lustre: 5936:0:(linux-debug.c:155:libcfs_debug_dumpstack()) Skipped 1
previous similar message
kiblnd_sd_01  R  running task       0  5936      1          5937  5935
(L-TLB)
00000100080017e0 0000000000000001 0000000100000001 0000000000000003
       0000000000000012 0000000000000001 ffffffffa0396f28
ffffffff80111731
       000001014a6bbdd8 000000000000019c
Call Trace:<ffffffff8011173c>{show_stack+241}
       <ffffffff8011173c>{show_stack+241}
       <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
       <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT
+0}<ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}

       <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
       <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
       <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
       <ffffffff80111600>{show_trace+375} <ffffffff8011173c>{show_stack
+241}
       <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
       <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
       <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
       <ffffffff801331a9>{default_wake_function+0}
<ffffffff801331a9>{default_wake_function+0}
<ffffffff801331a9>{default_wake_function+0}
<ffffffff80131f2f>{schedule_tail+55}
       <ffffffff80131f2f>{schedule_tail+55}
       <ffffffff80110e23>{child_rip+8} <ffffffff80110e23>{child_rip+8}
<ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
       <ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
       <ffffffff80110e1b>{child_rip+0} <ffffffff80110e1b>{child_rip+0}

<ffffffff80131f2f>{schedule_tail+55}
       <0>LustreError: 5935:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
<ffffffff80110e23>{child_rip+8}kiblnd_sd_00  R  running task       0
5935      1          5936   5420 (L-TLB)
000001014d126c00 00000000c00201f9 0000000100000001 0000000000000216
       000001007c047eb8 000000019237a000 0000000088000001
ffffffff80111731
       000001007c047dd8 000000000000019c
Call Trace:<ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
       <ffffffff80111600>{show_trace+375} <ffffffff80110e1b>{child_rip
+0}
<ffffffff8011173c>{show_stack+241}
       <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
       <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
       <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
       <1>LustreError: dumping log to /tmp/lustre-log.1162942447.5936
<ffffffff801331a9>{default_wake_function+0}
<ffffffff80131f2f>{schedule_tail+55}
       <ffffffff80110e23>{child_rip+8}
<ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
       <ffffffff80110e1b>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1162942447.5937
LustreError: dumping log to /tmp/lustre-log.1162942447.5935
LustreError: dumping log to /tmp/lustre-log.1162942447.5938
Lustre: 5938:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked LNET
upcall /usr/lib/lustre/lnet_upcall
LBUG,/usr/src/redhat/BUILD/lustre-1.5.95/lnet/libcfs/tracefile.c,libcfs_assertion_failed,412
LustreError: can't open /tmp/lustre-log.1162942447.5938 file: err -17
LustreError: can't open /tmp/lustre-log.1162942447.5938 for dump: rc -17
LustreError: can't open /tmp/lustre-log.1162942447.5938 file: err -17
LustreError: can't open /tmp/lustre-log.1162942447.5938 for dump: rc -17
LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error 0(waiting)
LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error 0(waiting)
LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error 0(waiting)
LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error
0(sending)(sending_rsrvd)(sending_nocred)(waiting)
LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
status -103
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942447, 55s ago)
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
10    up     8     8     8     0    -1 3136
LustreError: lfs-OST0004-osc-000001007103b400: Connection to service
lfs-OST0004 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error
-110(sending)(sending_rsrvd)(sending_nocred)(waiting)
LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
status -103
LustreError: 5939:0:(events.c:51:request_out_callback()) Skipped 1
previous similar message
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942447, 57s ago)
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
24    up     8     8     8   -13   -14 7368
LustreError: lfs-OST0000-osc-000001007103b400: Connection to service
lfs-OST0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Lustre: 1731:0:(o2iblnd_cb.c:2147:kiblnd_passive_connect()) Conn race
[EMAIL PROTECTED]
LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error
0(sending)(sending_rsrvd)(sending_nocred)(waiting)
LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
status -103
LustreError: 5939:0:(events.c:51:request_out_callback()) Skipped 1
previous similar message
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942447, 58s ago)
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
10    up     8     8     8     0    -1 3136
LustreError: lfs-OST0007-osc-000001007103b400: Connection to service
lfs-OST0007 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942447, 100s ago)
LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5     1 760
LustreError: lfs-OST0008-osc-000001007103b400: Connection to service
lfs-OST0008 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5     1 760
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5     1 760
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5     1 760
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5     1 760
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
6    up     8     8     8     4     1 1104
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
6    up     8     8     8     4     1 1104
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
8    up     8     8     8     2     0 2032
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
8    up     8     8     8     2     0 2032
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
8    up     8     8     8     2     0 2032
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
9    up     8     8     8     1     0 2376
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
4    up     8     8     8     6    -1 336
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
4    up     8     8     8     6    -1 336
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
4    up     8     8     8     6    -1 336
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
4    up     8     8     8     6    -1 336
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
15    up     8     8     8    -2    -2 3728
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
previous similar message
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Skipped 1 previous similar message
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942502, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped
41 previous similar messages
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
11    up     8     8     8    -1    -1 3480
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942504, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
27    up     8     8     8   -17   -17 8320
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942505, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
11    up     8     8     8    -1    -1 3480
LustreError: 7261:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942520, 100s ago)
LustreError: 7261:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 7261:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
27    up     8     8     8   -17   -17 8320
LustreError: lfs-MDT0000-mdc-000001007103b400: Connection to service
lfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
LustreError: Skipped 2 previous similar messages
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942547, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
6    up     8     8     8     4     1 1104
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
9    up     8     8     8     1     0 2376
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
5    up     8     8     8     5    -1 680
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
previous similar message
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Skipped 1 previous similar message
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 2
previous similar messages
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Skipped 2 previous similar messages
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942620, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 5
previous similar messages
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
29    up     8     8     8   -19   -19 9008
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
29    up     8     8     8   -19   -19 9008
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
12    up     8     8     8    -2    -2 3824
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
12    up     8     8     8    -2    -2 3824
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942670, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 7
previous similar messages
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
10    up     8     8     8     0     0 2720
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
6    up     8     8     8     4    -1 1024
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
7    up     8     8     8     3     1 1448
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942745, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 5
previous similar messages
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
31    up     8     8     8   -21   -21 9696
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
31    up     8     8     8   -21   -21 9696
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
13    up     8     8     8    -3    -3 4168
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
13    up     8     8     8    -3    -3 4168
Lustre: 5969:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 000001000ead4000

LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
previous similar message
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Skipped 2 previous similar messages
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 3
previous similar messages
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to [EMAIL PROTECTED]: error -110(waiting)
LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Skipped 3 previous similar messages
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1162942795, 100s ago)
LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 7
previous similar messages
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
11    up     8     8     8    -1    -1 3064
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
7    up     8     8     8     3    -1 1368
Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED]
8    up     8     8     8     2     1 1792
LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with [EMAIL PROTECTED]

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to