On Wed, 2008-08-06 at 09:29 -0600, Chris Worley wrote:
> On Wed, Aug 6, 2008 at 9:15 AM, Brian J. Murrell <[EMAIL PROTECTED]> wrote:
> >
> > So, now what does the MDS serving lfs-MDT0000 say about this?  Why did
> > it evict?  What version of Lustre is this?  Perhaps you said so already
> > and I have just forgotten.
> 
> 1.6.5.1 clients w/ 1.6.4.3 OSS's.
> 
> The MDS is very verbose.  I get these all the time, even prior to the error:
> 
> Lustre: lfs-OST0000: haven't heard from client
> 12f00621-096c-b331-8774-abfc72dfd82
> 2 (at [EMAIL PROTECTED]) in 92 seconds. I think it's dead, and I am evicting 
> it.

Yup.  If you can correlate those kinds of messages (they have the client
ip address in them) to the errors on the client, you have your eviction
events.

I notice that you are getting messages out of dmesg rather than syslog.
Syslog makes correlation easier and more definite due to the time
stamps.

But this kind of eviction is simply due to clients that are unresponsive
from the POV of the MDS.  They are neither making filesystem RPC nor are
they "ping"ing (keepalives) so the MDS assumes they have died and evicts
them to get back the locks it could be holding and not having that dead
client holding up other, living clients.

So you need to investigate why the clients are dying or appear to be
dead (i.e. going silent) to the MDS.

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to