On Fri, 2009-03-06 at 10:45 +0100, Thomas Roth wrote: > Thanks Brian. NP.
> What I meant: the average batch job that wants to read from or write to > Lustre will abort if a file cannot be accessed. The reason doesn't > matter to the jobs or the user. That may be so, but what I am saying is that when a lustre client wants to perform an i/o operation on behalf of an application running on that machine and the target it wants to do the i/o with is down, the lustre client will wait and block the applications i/o indefinitely. That means that unless the application has some kind of timer in it so that it can abort the read(2)/write(2), it will wait forever as the read(2) or write(2) system call that it issued will simply wait for the lustre client to complete -- forever, if the target that the lustre client wanted to do the i/o with never comes back. > So the Lustre client may wait forever, but for the users that is > irrelevant, they have to resubmit their jobs in any case. But what signals them to resubmit? A job waiting on I/O to a missing target will just "hang" (the proper term is block) until the target comes back. Is there some kind of timer that aborts a job if it takes too long? If so, then that is pretty orthogonal to the discussion of what happens to a lustre client during (a failed) recovery. > I was wondering whether a client whose transactions have not been > replayed may get into some zombie state. No. It should be evicted (that is why the transactions are not replayed) and will reconnect once recovery has been aborted and the target resumes it's normal (FULL) state. > Of course I see in the logs of > MDS and clients what is supposed to happen, that remainig stuff on the > client is discarded, inodes deleted etc. In some cases this will not > work, I'm sure. But then reboot of the client will clean up. A reboot of the client should never be necessary to return it to the filesystem. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
