On Thu, 10 Jan 2008, Oleg Drokin wrote:

>> Ignoring prediction from [EMAIL PROTECTED] of [EMAIL PROTECTED]
>> down 4829687047 seconds in the future
>
> This is harmless message that would be shut in 1.6.5
> You can see details in bug 14300

OK.

> As for your original message - hard to tell what caused it. We can see
> that servers decided the client was unresponsive.
> Could it be some network packet lost for example?
> Were not there any other messages at around 12:20 and before that
> (that's when it was evicted) on a client?
> Because at 12:40 - that's already 20 minutes past eviction.

Thats the weird thing - there's nothing lustre-related logged before 
that on the client that day! The client seems oblivious to the fact 
that it's been evicted, and this was while it was doing IO... Also the 
clocks are synced by ntp, and thus not off by much...

I could accept network errors etc as an explanation, but then I would 
have assumed that the client would have logged stuff, tried 
reconnecting etc... As it was it was simply dead in the water until I 
rebooted the thing.

What mechanism does Lustre use to check if a peer is up? Since lctl 
ping worked between all nodes I suspect it uses something more 
involved. Can I trigger the same check using lctl?


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    [EMAIL PROTECTED]
---------------------------------------------------------------------------
  "Wow, Veronica, he totally wants to protect and serve you." - Meg
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to