Hi Andreas,
Thanks. Disabling debug on clients made a good difference in small
file operations.
There are quite a lot of lustre related errors in the log. The file
system is working fine.
Now i have so many of the following error messages and stack dumps in
the logs and many files
created as /tmp/lustre.log.-----
Could disabling debug on a running client cause this?
Thanks very much
Balagopal
LustreError: 3396:0:(osc_create.c:82:osc_interpret_create()) @@@ unknown
rc -11 from async create: failing oscc [EMAIL PROTECTED] x12281982/t0
o5->[EMAIL PROTECTED]@tcp:28 lens 336/336 ref 1 fl
Complete:R/2/0 rc -11/-11
Lustre: scratch-OST0000-osc: Connection to service scratch-OST0000 via
nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait
for recovery to complete.
LustreError: 3397:0:(client.c:574:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -16 [EMAIL PROTECTED] x12282014/t0
o8->[EMAIL PROTECTED]@tcp:28 lens 304/328 ref 1 fl
Rpc:R/0/0 rc 0/-16
Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 3563:
it was inactive for 100s
Lustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack
for process 3563
ll_mdt_17 S 0000000000000000 0 3563 1 3564 3562
(L-TLB)
0000010074ec3238 0000000000000046 0000010077ff8800 0000010000000073
0000010074ec31c8 00000000801331f6 0000010001029aa0 0000000100000008
000001007ce15030 000000000000009d
Call Trace:<ffffffffa049e528>{:osc:oscc_has_objects+259}
<ffffffffa049e9a6>{:osc:oscc_precreate+580}
<ffffffff801331a5>{default_wake_function+0}
<ffffffffa049f86b>{:osc:osc_create+3297}
<ffffffffa0364a76>{:lov:qos_prep_create+3728}
<ffffffffa035bbe4>{:lov:lov_prep_create_set+541}
<ffffffffa0348d59>{:lov:lov_create+6516}
<ffffffffa03dd4af>{:ldiskfs:ldiskfs_xattr_ibody_get+403}
<ffffffffa0429f3d>{:fsfilt_ldiskfs:fsfilt_ldiskfs_get_md+101}
<ffffffffa04415c1>{:mds:mds_get_md+105}
<ffffffffa046155b>{:mds:mds_create_objects+3818}
<ffffffffa04421c2>{:mds:mds_pack_md+409}
<ffffffffa04631aa>{:mds:mds_finish_open+704}
<ffffffffa0465625>{:mds:mds_open+6486}
<ffffffffa0459c91>{:mds:mds_reint_rec+373}
<ffffffffa0443e26>{:mds:mds_reint+637}
<ffffffffa044c22a>{:mds:mds_intent_policy+890}
<ffffffffa0278eee>{:ptlrpc:ldlm_resource_putref+356}
<ffffffffa0274ca2>{:ptlrpc:ldlm_lock_create+1375}
<ffffffffa0275d85>{:ptlrpc:ldlm_lock_enqueue+208}
<ffffffffa02887d7>{:ptlrpc:ldlm_handle_enqueue+2524}
<ffffffffa02871ba>{:ptlrpc:ldlm_server_blocking_ast+0}
<ffffffffa028772d>{:ptlrpc:ldlm_server_completion_ast+0}
<ffffffffa04483a9>{:mds:mds_handle+15254}
<ffffffffa01b3f69>{:libcfs:libcfs_debug_vmsg2+1713}
<ffffffff801e9ce7>{vsnprintf+1406} <ffffffff801e9dca>{snprintf+131}
<ffffffffa02a1e49>{:ptlrpc:ptlrpc_server_handle_request+2514}
<ffffffffa02a1e57>{:ptlrpc:ptlrpc_server_handle_request+2528}
<ffffffff8013f100>{__mod_timer+293}
<ffffffffa02a3d1b>{:ptlrpc:ptlrpc_main+2018}
<ffffffff801331a5>{default_wake_function+0}
<ffffffffa02a297a>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa02a297a>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110e23>{child_rip+8}
<ffffffffa02a3539>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110e1b>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1182448666.3563
Lustre: scratch-OST0000-osc: Connection restored to service
scratch-OST0000 using nid [EMAIL PROTECTED]
Lustre: scratch-OST0000: received MDS connection from [EMAIL PROTECTED]
Lustre: 3650:0:(recov_thread.c:565:llog_repl_connect()) llcd
000001001d47d000:000001007453e180 not empty
Lustre: MDS scratch-MDT0000: scratch-OST0000_UUID now active, resetting
orphans
LustreError: 3479:0:(llog_server.c:435:llog_origin_handle_cancel())
cancel 2 llog-records failed: -22
LustreError: 3705:0:(recov_thread.c:396:log_commit_thread()) commit
000001001d47d000:000001007453e180 drop 2 cookies: rc -22
LustreError: 3563:0:(service.c:668:ptlrpc_server_handle_request())
request 13132122 opc 101 from [EMAIL PROTECTED] processed in 101s
trans 38707883 rc 301/301
Lustre: 3563:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for
pid 3563 disabled after 101.9196s
LustreError: 0:0:(ldlm_lockd.c:213:waiting_locks_callback()) ### lock
callback timer expired: evicting client
[EMAIL PROTECTED] nid
[EMAIL PROTECTED] ns: filter-scratch-OST0000_UUID lock:
0000010070c69680/0x24c4cddb1a7d1a3a lrc: 2/0,0 mode: PR/PR res:
5587403/0 rrc: 2 type: EXT [0->18446744073709551615] (req
0->18446744073709551615) flags: 10020 remote: 0x442628bd10ee4706 expref:
577 pid: 3637
Lustre: 3568:0:(mds_reint.c:124:mds_finish_transno()) commit transaction
for disconnected client ae77a3e4-6d9d-217a-6033-fefb69622a73: rc 0
LustreError: 3687:0:(lustre_fsfilt.h:205:fsfilt_start_log())
scratch-OST0000: slow journal start 62s
LustreError: 3669:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@
processing error (-107) [EMAIL PROTECTED] x13248558/t0 o6-><?>@<?>:-1
lens 336/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 3669:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2
previous similar messages
LustreError: 3683:0:(filter.c:1139:filter_parent_lock())
scratch-OST0000: slow parent lock 62s
LustreError: 4950:0:(mds_open.c:1461:mds_close()) @@@ no handle for file
close ino 29157575: cookie 0x24c4cddb1a00178a [EMAIL PROTECTED]
x13280339/t0
o35->[EMAIL PROTECTED]:-1
lens 296/392 ref 0 fl Interpret:/0/0 rc 0/0
LustreError: 4950:0:(mds_open.c:1461:mds_close()) Skipped 1 previous
similar message
Lustre: 6370:0:(lustre_fsfilt.h:205:fsfilt_start_log()) scratch-OST0000:
slow journal start 36s
LustreError: 6423:0:(lustre_fsfilt.h:205:fsfilt_start_log())
scratch-OST0000: slow journal start 80s
LustreError: 6423:0:(lustre_fsfilt.h:205:fsfilt_start_log()) Skipped 79
previous similar messages
LustreError: 4883:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log())
scratch-OST0000: slow journal start 65s
LustreError: 4909:0:(filter_io_26.c:680:filter_commitrw_write())
scratch-OST0000: slow brw_start 65s
LustreError: 4909:0:(filter_io_26.c:680:filter_commitrw_write()) Skipped
7 previous similar messages
LustreError: 4883:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log()) Skipped
11 previous similar messages
LustreError: 4877:0:(filter_io_26.c:732:filter_commitrw_write())
scratch-OST0000: slow direct_io 65s
LustreError: 4877:0:(filter_io_26.c:732:filter_commitrw_write()) Skipped
2 previous similar messages
LustreError: 4887:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log())
scratch-OST0000: slow journal start 66s
LustreError: 4887:0:(filter_io_26.c:680:filter_commitrw_write())
scratch-OST0000: slow brw_start 66s
LustreError: 4887:0:(filter_io_26.c:680:filter_commitrw_write()) Skipped
4 previous similar messages
LustreError: 4909:0:(filter_io_26.c:745:filter_commitrw_write())
scratch-OST0000: slow commitrw com
Andreas Dilger wrote:
On Jun 21, 2007 10:20 -0300, Balagopal Pillai wrote:
I am using Lustre 1.6.0.1 with one OST and 20 clients in an
HPC cluster.
The OST/MDT/MGS has a 16 channel 3ware 9650 using raid6.
*Now the problem - Lustre performance dips a lot when it comes to small
files. Please see the following fileop -f 5 test comparing NFS and Lustre -
We have had reports of read performance problems with 3ware in the past.
Lustre -
Fileop: File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
. mkdir rmdir create read write close stat access chmod
readdir link unlink delete Total_files
A 5 1654 691 132 14228 719 4874 1987 32737 1718
2506 1262 1340 1608 125
NFS -
Fileop: File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
. mkdir rmdir create read write close stat access chmod
readdir link unlink delete Total_files
A 5 177 594 459 380747 137392 2282 1219 444312 502
1274 306 513 464 125
Could you please recommend any tunables to get a bit more
performance out of Lustre with lots of small files? Lots of small files
was bad in GFS too, but it was better than NFS though.
Try setting the debug on the clients to zero "sysctl -w lnet.debug=0".
*Also the read performance of Lustre seems to be a little behind NFS. I
had /opt which has all the software for users moved to Lustre in the new
setup. But
software like Matlab, Splus etc takes almost a minute to come up. The
second time is very fast though, maybe due to caching. So i am thinking
of putting /opt
back to NFS. Is it possible to boost the read performance of Lustre a bit?
Turn off debug is the easiest.
*One last question. In the older Lustre setup (version 1.4.5), i have 5
scsi drives one each as an OST for a single volume. The volume became
full. But df still reported that there is 27GB free. There doesn't seem
to be an lfs df option in that version of Lustre. So i couldn't see the
individual utilization of each of the 5 OST. Is this a striping problem?
In the absence of "lfs df" use /proc to find per-OST usage on the client:
grep '[0-9]' /proc/fs/lustre/osc/*/kbytes*
Lustre 1.6 should be better at balancing the OST usage than 1.4.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss