Hi, 在 2010-11-16,下午7:25, Arne Brutschy 写道:
> Hello, > >> From the log, we can see that either your MGS node was not ready for >> connection yet, or there's network error between client and the MGS node. > > No error on the server nor on the client. What else can it be? Maybe the > switch is bad, I can see RX errors on most of it's interfaces. The switch could be the culprit - error message shows client failed to send request to MGS. Network sending status was -EHOSTUNREACH. I suggest you reexamine the network of your system. > >> Were you rebooting the MGS at the moment? > > No. It's something that happenes regularly. > >> Since you said there's no errors on the interface, you need to check >> the lnet connection and also verify that the MGS/MDT are up running. > > As far as I can tell, everything seems to be set up correctly. I have > quite a simple setup (single network, single interface gbe). > > Thanks > Arne > >> 在 2010-11-15,下午11:32, Arne Brutschy 写道: >> >>> Hi all, >>> >>> I am mounting lustre through an fstab entry. This fails quite often, the >>> nodes end up without the lustre mount. Even when I log in, it take 2-3 >>> tries to get it to mount. This is what I get: >>> >>> mount /lustre >>> mount.lustre: mount 10.1....@tcp0:/lustre at /lustre failed: Cannot >>> send after transport endpoint shutdown >>> >>> This is /var/log/messages: >>> >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: >>> 2124:0:(lib-move.c:2441:LNetPut()) Error sending PUT to 12345-10.1....@tcp: >>> -113 >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: >>> 2124:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 >>> r...@d73d7c00 x1352468062535684/t0 o250->[email protected]@tcp_0:26/25 lens >>> 368/584 e 0 to 1 dl 1289834868 ref 2 fl Rpc:N/0/0 rc 0/0 >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: >>> 29069:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID >>> r...@d73d7800 x1352468062535685/t0 o101->[email protected]@tcp_0:26/25 lens >>> 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: 15c-8: >>> mgc10.1....@tcp: The configuration from log 'lustre-client' failed (-108). >>> This may be the result of communication errors between this node and the >>> MGS, a bad configuration, or other errors. See the syslog for more >>> information. >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: >>> 29069:0:(llite_lib.c:1176:ll_fill_super()) Unable to process log: -108 >>> Nov 15 16:27:43 compute-1-10 kernel: LustreError: >>> 29069:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-108) >>> >>> I have no errors on the interface, so I assume this is a timing problem. >>> Can I improve this through some timeout setting? >>> >>> Cheers, >>> Arne >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > -- > Arne Brutschy > Ph.D. Student Email arne.brutschy(AT)ulb.ac.be > IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy > Universite' Libre de Bruxelles Tel +32 2 650 2273 > Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 > 1050 Bruxelles, Belgium (Fax at IRIDIA secretary) > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
