Brian J. Murrell wrote: > On Fri, 2009-03-06 at 20:09 +0100, Thomas Roth wrote: >> But this is not what our users observe. Even on an otherwise perfectly >> working system, they report I/O errors on access to some files. > > EIO == eviction. > >> I can usually see something happening in the logs of OST and client: >> The OST starts with "timeout on bulk PUT after 6+0s", which the OST is >> first "ignoring bulk IO comm error" in the hope that "client will >> retry". > > Wait a minute. This thread is about server recovery, not communications > failures. You are mixing up errors and situations here. > > Communications failures will result in timeouts on the server and that > will result in evictions which will result in EIOs for your > applications. This has got nothing to do with server recovery though.
You are right, of course, this comes from a different situation. I just assumed that if a client cannot cope with a 1sec-interruption due to a communication failure, resulting in an EIO, how can it (resp. the application) survive an interruption of the entire system of several hours. Of course, if the client does react in a different manner during server recovery, then also the application will see things differently. I guess that's what I misunderstood. In fact the client's logs during yesterdays recovery don't look so bad at all ;-) Just a number of "Request xyz sent from MDT0000-mdc to NID MGS ... timed out", as expected. Thanks for poiting this out., Thomas >> "Request ... has timed out >> (limit 7s)", "Connection to service was lost; in progress operations >> using this service will fail", finally "Connection restored to service". > > Yes. This is a timeout and nothing to do with the subject of server > recovery. > > b. > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
