Hi,

Indeed client has disconnected from MDS. We actually see that quite frequently during OSS failover also on other clients. Is that indicates that during OSS failure MDS is very busy and clients <-> mds connections timeout? Is there a way to prevent from such a situation maybe some MDS tuning? I don't really see the reason why clients could not communicate with MDS while only OSS is having problem.

Cheers,

Wojciech


On 16 Nov 2007, at 15:46, Oleg Drokin wrote:

Hello!

On Nov 16, 2007, at 8:43 AM, Wojciech Turek wrote:
We've seen LBUG message today. It happened during failover of one
OSS's to another one.
Actually messages suggest that there was mds failover as well.
Can you specify which messages suggest that ? I am asking because as far as I can see there was no MDS failover. We have failover configured with heartbeat I can see everything stayed on the same server.

Nov 15 22:10:14 darwin kernel: Lustre: ddn_home-MDT0000-
mdc-00000100cff22800: Connection restored to service ddn_home-MDT0000
using nid [EMAIL PROTECTED]

This message means that connection was restored to your MDS.
I cannot tell if it was indeed failover (sorry, I used wrong word), but I can tell this client disconnected from MDS previously and later reconnected to it by this message. I assumed since you were speaking of failovers MDS might have been failed over as well (due to disconnection), but this is not necessary the case.

Bye,
    Oleg

Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: [EMAIL PROTECTED]
tel. +441223763517



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to