On a file system thats been up for only 57 days, I have: 505 lustre-log. dumps.
THe problem at hand is a user has many jobs where his jobs are now hung trying to create a directory from his pbs script. On the clients i see: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_connect operation failed with -16 LustreError: Skipped 2 previous similar messages On every client his jobs are on. In the most recent /tmp/lustre-log. on the MDS/MGS I see this message: @@@ processing error (-16) [EMAIL PROTECTED] x12808293/t0 o38- >[EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 ldlm_lib.c target_handle_reconnect nobackup-MDT0000: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting ldlm_lib.c target_handle_connect nobackup-MDT0000: refuse reconnection from 34b4fbea-200b-1f7c- [EMAIL PROTECTED]@tcp to 0x00000100069a7000; still busy with 2 active RPCs ldlm_lib.c target_send_reply_msg @@@ processing error (-16) [EMAIL PROTECTED] x11199816/t0 o38- >[EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 What I see messages about active rpc's in other logs. What would this mean? Is something suck someplace ? Brock Palen Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss