Hello, is it possible to optimize Lustre so that is supports really large directories (with 30k small files in it)? We have 8 physical clients which process jpeg files stored on Lustre volume and I get sooner or later client freezes - ls in Lustre directory waits forever. I there something I could do to improve performance?
The lustre server is Build Version: 1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19 The lustre client is Build Version: 1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustre I got the following messages on the client: Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service stable-MDT0000 via nid x.x....@tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 2 previous similar messages LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 37 previous similar messages LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped 37 previous similar messages Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 8s Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 13s Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 18s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with x.x....@tcp. The mds_connect operation failed with -16 Lustre: Request x112815827 sent from stable-OST0001-osc-ffff8802855b7800 to NID x.x....@tcp 100s ago has timed out (limit 100s). Lustre: Skipped 9 previous similar messages Lustre: stable-OST0001-osc-ffff8802855b7800: Connection to service stable-OST0001 via nid x.x....@tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 1 previous similar message LustreError: 128:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 128:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Lustre: stable-OST0001-osc-ffff8802855b7800: Connection restored to service stable-OST0001 using nid x.x....@tcp. Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 23s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 166-1: mgcx.x....@tcp: Connection to service MGS via nid x.x....@tcp was lost; in progress operations using this service will fail. Lustre: mgcx.x....@tcp: Reactivating import Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 28s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 33s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with x.x....@tcp. The mds_connect operation failed with -16 LustreError: Skipped 5 previous similar messages Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 38s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 3158:0:(events.c:66:request_out_callback()) @@@ type 4, status -5 r...@ffff8801002dd800 x112816084/t0 o103->[email protected]@o2ib:17/18 lens 648/256 e 0 to 1 dl 1253177767 ref 2 fl Rpc:N/0/0 rc 0/0 LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 194 previous similar messages -- Lukáš Hejtmánek _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
