Re: [Lustre-discuss] lustre 1.6.0.1

Balagopal Pillai Thu, 21 Jun 2007 12:05:53 -0700




Hi Andreas,

Thanks. Disabling debug on clients made a good difference in smallfile operations.

There are quite a lot of lustre related errors in the log. The filesystem is working fine.

Now i have so many of the following error messages and stack dumps inthe logs and many files

created as /tmp/lustre.log.-----

   Could disabling debug on a running client cause this?

Thanks very much


Balagopal

LustreError: 3396:0:(osc_create.c:82:osc_interpret_create()) @@@ unknownrc -11 from async create: failing oscc [EMAIL PROTECTED] x12281982/t0o5->[EMAIL PROTECTED]@tcp:28 lens 336/336 ref 1 flComplete:R/2/0 rc -11/-11Lustre: scratch-OST0000-osc: Connection to service scratch-OST0000 vianid [EMAIL PROTECTED] was lost; in progress operations using this service will waitfor recovery to complete.LustreError: 3397:0:(client.c:574:ptlrpc_check_status()) @@@ type ==PTL_RPC_MSG_ERR, err == -16 [EMAIL PROTECTED] x12282014/t0o8->[EMAIL PROTECTED]@tcp:28 lens 304/328 ref 1 flRpc:R/0/0 rc 0/-16Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 3563:it was inactive for 100sLustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stackfor process 3563ll_mdt_17 S 0000000000000000 0 3563 1 3564 3562(L-TLB)

0000010074ec3238 0000000000000046 0000010077ff8800 0000010000000073
     0000010074ec31c8 00000000801331f6 0000010001029aa0 0000000100000008
     000001007ce15030 000000000000009d

Call Trace:<ffffffffa049e528>{:osc:oscc_has_objects+259}<ffffffffa049e9a6>{:osc:oscc_precreate+580}<ffffffff801331a5>{default_wake_function+0}<ffffffffa049f86b>{:osc:osc_create+3297}<ffffffffa0364a76>{:lov:qos_prep_create+3728}<ffffffffa035bbe4>{:lov:lov_prep_create_set+541}<ffffffffa0348d59>{:lov:lov_create+6516}<ffffffffa03dd4af>{:ldiskfs:ldiskfs_xattr_ibody_get+403}

     <ffffffffa0429f3d>{:fsfilt_ldiskfs:fsfilt_ldiskfs_get_md+101}

<ffffffffa04415c1>{:mds:mds_get_md+105}<ffffffffa046155b>{:mds:mds_create_objects+3818}<ffffffffa04421c2>{:mds:mds_pack_md+409}<ffffffffa04631aa>{:mds:mds_finish_open+704}<ffffffffa0465625>{:mds:mds_open+6486}<ffffffffa0459c91>{:mds:mds_reint_rec+373}<ffffffffa0443e26>{:mds:mds_reint+637}<ffffffffa044c22a>{:mds:mds_intent_policy+890}

     <ffffffffa0278eee>{:ptlrpc:ldlm_resource_putref+356}

<ffffffffa0274ca2>{:ptlrpc:ldlm_lock_create+1375}<ffffffffa0275d85>{:ptlrpc:ldlm_lock_enqueue+208}

     <ffffffffa02887d7>{:ptlrpc:ldlm_handle_enqueue+2524}
     <ffffffffa02871ba>{:ptlrpc:ldlm_server_blocking_ast+0}
     <ffffffffa028772d>{:ptlrpc:ldlm_server_completion_ast+0}

<ffffffffa04483a9>{:mds:mds_handle+15254}<ffffffffa01b3f69>{:libcfs:libcfs_debug_vmsg2+1713}

     <ffffffff801e9ce7>{vsnprintf+1406} <ffffffff801e9dca>{snprintf+131}
     <ffffffffa02a1e49>{:ptlrpc:ptlrpc_server_handle_request+2514}
     <ffffffffa02a1e57>{:ptlrpc:ptlrpc_server_handle_request+2528}

<ffffffff8013f100>{__mod_timer+293}<ffffffffa02a3d1b>{:ptlrpc:ptlrpc_main+2018}<ffffffff801331a5>{default_wake_function+0}<ffffffffa02a297a>{:ptlrpc:ptlrpc_retry_rqbds+0}<ffffffffa02a297a>{:ptlrpc:ptlrpc_retry_rqbds+0}<ffffffff80110e23>{child_rip+8}<ffffffffa02a3539>{:ptlrpc:ptlrpc_main+0}<ffffffff80110e1b>{child_rip+0}LustreError: dumping log to /tmp/lustre-log.1182448666.3563Lustre: scratch-OST0000-osc: Connection restored to servicescratch-OST0000 using nid [EMAIL PROTECTED]

Lustre: scratch-OST0000: received MDS connection from [EMAIL PROTECTED]

Lustre: 3650:0:(recov_thread.c:565:llog_repl_connect()) llcd000001001d47d000:000001007453e180 not emptyLustre: MDS scratch-MDT0000: scratch-OST0000_UUID now active, resettingorphansLustreError: 3479:0:(llog_server.c:435:llog_origin_handle_cancel())cancel 2 llog-records failed: -22LustreError: 3705:0:(recov_thread.c:396:log_commit_thread()) commit000001001d47d000:000001007453e180 drop 2 cookies: rc -22LustreError: 3563:0:(service.c:668:ptlrpc_server_handle_request())request 13132122 opc 101 from [EMAIL PROTECTED] processed in 101strans 38707883 rc 301/301Lustre: 3563:0:(watchdog.c:312:lcw_update_time()) Expired watchdog forpid 3563 disabled after 101.9196sLustreError: 0:0:(ldlm_lockd.c:213:waiting_locks_callback()) ### lockcallback timer expired: evicting client[EMAIL PROTECTED] nid[EMAIL PROTECTED] ns: filter-scratch-OST0000_UUID lock:0000010070c69680/0x24c4cddb1a7d1a3a lrc: 2/0,0 mode: PR/PR res:5587403/0 rrc: 2 type: EXT [0->18446744073709551615] (req0->18446744073709551615) flags: 10020 remote: 0x442628bd10ee4706 expref:577 pid: 3637Lustre: 3568:0:(mds_reint.c:124:mds_finish_transno()) commit transactionfor disconnected client ae77a3e4-6d9d-217a-6033-fefb69622a73: rc 0LustreError: 3687:0:(lustre_fsfilt.h:205:fsfilt_start_log())scratch-OST0000: slow journal start 62sLustreError: 3669:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@processing error (-107) [EMAIL PROTECTED] x13248558/t0 o6-><?>@<?>:-1lens 336/0 ref 0 fl Interpret:/0/0 rc -107/0LustreError: 3669:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2previous similar messagesLustreError: 3683:0:(filter.c:1139:filter_parent_lock())scratch-OST0000: slow parent lock 62sLustreError: 4950:0:(mds_open.c:1461:mds_close()) @@@ no handle for fileclose ino 29157575: cookie 0x24c4cddb1a00178a [EMAIL PROTECTED]x13280339/t0o35->[EMAIL PROTECTED]:-1lens 296/392 ref 0 fl Interpret:/0/0 rc 0/0LustreError: 4950:0:(mds_open.c:1461:mds_close()) Skipped 1 previoussimilar messageLustre: 6370:0:(lustre_fsfilt.h:205:fsfilt_start_log()) scratch-OST0000:slow journal start 36sLustreError: 6423:0:(lustre_fsfilt.h:205:fsfilt_start_log())scratch-OST0000: slow journal start 80sLustreError: 6423:0:(lustre_fsfilt.h:205:fsfilt_start_log()) Skipped 79previous similar messagesLustreError: 4883:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log())scratch-OST0000: slow journal start 65sLustreError: 4909:0:(filter_io_26.c:680:filter_commitrw_write())scratch-OST0000: slow brw_start 65sLustreError: 4909:0:(filter_io_26.c:680:filter_commitrw_write()) Skipped7 previous similar messagesLustreError: 4883:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log()) Skipped11 previous similar messagesLustreError: 4877:0:(filter_io_26.c:732:filter_commitrw_write())scratch-OST0000: slow direct_io 65sLustreError: 4877:0:(filter_io_26.c:732:filter_commitrw_write()) Skipped2 previous similar messagesLustreError: 4887:0:(lustre_fsfilt.h:240:fsfilt_brw_start_log())scratch-OST0000: slow journal start 66sLustreError: 4887:0:(filter_io_26.c:680:filter_commitrw_write())scratch-OST0000: slow brw_start 66sLustreError: 4887:0:(filter_io_26.c:680:filter_commitrw_write()) Skipped4 previous similar messagesLustreError: 4909:0:(filter_io_26.c:745:filter_commitrw_write())scratch-OST0000: slow commitrw com

Andreas Dilger wrote:

On Jun 21, 2007  10:20 -0300, Balagopal Pillai wrote:
I am using Lustre 1.6.0.1 with one OST and 20 clients in anHPC cluster.
The OST/MDT/MGS has a 16 channel 3ware 9650 using raid6.
*Now the problem - Lustre performance dips a lot when it comes to smallfiles. Please see the following fileop -f 5 test comparing NFS and Lustre -
We have had reports of read performance problems with 3ware in the past.
Lustre -
Fileop: File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst). mkdir rmdir create read write close stat access chmodreaddir link unlink delete Total_filesA 5 1654 691 132 14228 719 4874 1987 32737 17182506 1262 1340 1608 125
NFS -
Fileop:  File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
. mkdir rmdir create read write close stat access chmodreaddir link unlink delete Total_filesA 5 177 594 459 380747 137392 2282 1219 444312 5021274 306 513 464 125
Could you please recommend any tunables to get a bit moreperformance out of Lustre with lots of small files? Lots of small fileswas bad in GFS too, but it was better than NFS though.
Try setting the debug on the clients to zero "sysctl -w lnet.debug=0".
*Also the read performance of Lustre seems to be a little behind NFS. Ihad /opt which has all the software for users moved to Lustre in the newsetup. Butsoftware like Matlab, Splus etc takes almost a minute to come up. Thesecond time is very fast though, maybe due to caching. So i am thinkingof putting /opt
back to NFS. Is it possible to boost the read performance of Lustre a bit?
Turn off debug is the easiest.
*One last question. In the older Lustre setup (version 1.4.5), i have 5scsi drives one each as an OST for a single volume. The volume becamefull. But df still reported that there is 27GB free. There doesn't seem
to be an lfs df option in that version of Lustre. So i couldn't see the
individual utilization of each of the 5 OST. Is this a striping problem?
In the absence of "lfs df" use /proc to find per-OST usage on the client:

grep '[0-9]' /proc/fs/lustre/osc/*/kbytes*

Lustre 1.6 should be better at balancing the OST usage than 1.4.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lustre 1.6.0.1

Reply via email to