On a file system thats been up for only 57 days,  I have:

505 lustre-log.   dumps.

THe problem at hand is a user has many jobs where his jobs are now  
hung trying to create a directory from his pbs script.  On the  
clients i see:

LustreError: 11-0: an error occurred while communicating with  
[EMAIL PROTECTED] The mds_connect operation failed with -16
LustreError: Skipped 2 previous similar messages

On every client his jobs are on.

In the most recent /tmp/lustre-log.  on the MDS/MGS I see this message:

@@@ processing error (-16)  [EMAIL PROTECTED] x12808293/t0 o38- 
 >[EMAIL PROTECTED]:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
ldlm_lib.c
target_handle_reconnect
nobackup-MDT0000: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
ldlm_lib.c
target_handle_connect
nobackup-MDT0000: refuse reconnection from 34b4fbea-200b-1f7c- 
[EMAIL PROTECTED]@tcp to 0x00000100069a7000; still busy  
with 2 active RPCs
ldlm_lib.c
target_send_reply_msg
@@@ processing error (-16)  [EMAIL PROTECTED] x11199816/t0 o38- 
 >[EMAIL PROTECTED]:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


What I see messages about active rpc's in other logs.  What would  
this mean?  Is something suck someplace ?



Brock Palen
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to