On Fri, 2009-03-06 at 20:09 +0100, Thomas Roth wrote:
> 
> But this is not what our users observe. Even on an otherwise perfectly
> working system, they report I/O errors on access to some files.

EIO == eviction.

> I  can usually see something happening in the logs of OST and client:
> The OST starts with "timeout on bulk PUT after 6+0s", which the OST is
> first "ignoring bulk IO comm error" in the hope that "client will
> retry".

Wait a minute.  This thread is about server recovery, not communications
failures.  You are mixing up errors and situations here.

Communications failures will result in timeouts on the server and that
will result in evictions which will result in EIOs for your
applications.  This has got nothing to do with server recovery though.

> "Request ... has timed out
> (limit 7s)", "Connection to service was lost; in progress operations
> using this service will fail", finally "Connection restored to service".

Yes.  This is a timeout and nothing to do with the subject of server
recovery.

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to