The last day our MDS refusing conections too. The logs are the same, and we should reboot the MDS server . What's is the reason for this?
2009/3/5 Thomas Roth <[email protected]> > Hi all, > > after running for days without any problems, our MDS is refusing > cooperation for two hours now. > The log files show nothing until > >Mar 5 16:46:24 mds1 kernel: Lustre: > 17841:0:(ldlm_lib.c:525:target_handle_reconnect()) MDT0000: 481fa70b-590d > -31b6-f621-c6125a54bfff reconnecting > >Mar 5 16:46:24 mds1 kernel: Lustre: > 17841:0:(ldlm_lib.c:760:target_handle_connect()) MDT0000: refuse reconnec > tion from [email protected]@tcp to > 0xffff8107ef44a000; still busy with 2 active RPCs > > I thought that such a thing would be between the MDT and this particular > client. However, the log goes on like that with many other clients. > > Now the MDS is refusing any connection, bringing the system to a stand > still. > > The situation also triggered the dumping of ca. 130 log dumps to /tmp. > Most of these are small and contain just > >Watchdog triggered for pid 17866: it was inactive for 12000s > >nable to dump stack because of missing export > > A few are larger and contain more complaints about lengthy requests and > possible timeouts: > >ptlrpc_server_handle_request Request x75091039 took longer than > estimated (42+4208s); client may timeout. > or > >ptlrpc_server_handle_request Dropping timed-out request from > 12345-140.181.114....@tcp: deadline 1000+923s ago > > All of these do not seem critical? > Maybe all clients have timed out for some reason? > Even so, I'd assume the MDS to be still responsive, say to a mount > request from a fresh client, one that does not possibly have any > leftover transactions pending on it? > > Right now the only thing I see to do is to reboot the server. Of course > not a nice procedure on a system we advertised as stable and reliable to > our users... > > So any help will be much appreciated. > Regards, > Thomas > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- (\__/) ( O.o) ( > <) Este es conejo. Copia a conejo en tu firma y ayudalo en sus planes de dominación mundial.
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
