[Lustre-discuss] Hung Lustre filesystem until a remount

Jeremy Mann Thu, 22 Jan 2009 12:05:43 -0800

We have been running Lustre for a few years now and today was the first
time I came upon something I haven't seen before. The lustre partition was
mounted and I could access files within it, however the minute I started
opening the large files, it became unstable and hung. The system load shot
up to 33 (on the headnode client) and Lustre was using approximately 6 GB
of memory.  I stopped all of our services that write into the Lustre
partition and unmounted /lustre. Tailing the logs during this process, I
saw:


LustreError: 8620:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108
from cancel RPC: canceling anyway
LustreError: 8620:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Skipped
308135 previous similar messages
LustreError: 8620:0:(ldlm_request.c:1575:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
LustreError: 8620:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) Skipped
308135 previous similar messages
LustreError: 8620:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108
from cancel RPC: canceling anyway
LustreError: 8620:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Skipped
710099 previous similar messages
LustreError: 8620:0:(ldlm_request.c:1575:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
LustreError: 8620:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) Skipped
710099 previous similar messages

Over and over again. A few minutes later, Lustre unmounted and freed up
the 6GB of memory it was using. I didn't see anything wrong with our OSTs
and remounted the Lustre partition on the headnode and now everything is
back to normal. I'm wondering what could have caused this in the first
place?

Rocks 5 (RHEL5), Lustre 1.6.5.1, Kernel 2.6.18-53.1.14.el5_lustre.1.6.5.1smp


-- 
Jeremy Mann
[email protected]

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Hung Lustre filesystem until a remount

Reply via email to