We've got the latest lustre running(1.8.4) and kernel 2.6.18-194.3.1.el5. I call it our primary client as it is what exposes the file system for others to use via nfs/samba. Today the machine seeminly rebooted on its own and checking the logs I see these messages
Nov 22 12:25:52 cajal kernel: LustreError: 3909:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Skipped 8 previous similar messages Nov 22 12:25:52 cajal kernel: LustreError: 11b-b: Connection to 192.168.5....@tcp at host 192.168.5.101 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.5....@tcp one of its NIDs? Nov 22 12:25:52 cajal kernel: LustreError: Skipped 8 previous similar messages Nov 22 12:31:22 cajal kernel: LustreError: 5870:0:(llite_nfs.c:96:search_inode_for_lustre()) failure -2 inode 565846402 Nov 22 12:31:22 cajal kernel: LustreError: 5870:0:(llite_nfs.c:96:search_inode_for_lustre()) Skipped 490 previous similar messages Nov 22 12:33:40 cajal mountd[5959]: /lustre/home and /home have same filehandle for 10.0.0.0/255.0.0.0, using first Nov 22 12:36:31 cajal kernel: LustreError: 3908:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.5.101 Nov 22 12:36:31 cajal kernel: LustreError: 3908:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Skipped 9 previous similar messages Nov 22 12:36:31 cajal kernel: LustreError: 11b-b: Connection to 192.168.5....@tcp at host 192.168.5.101 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.5....@tcp one of its NIDs? Nov 22 12:36:31 cajal kernel: LustreError: Skipped 9 previous similar messages Nov 22 12:37:34 cajal mountd[5959]: authenticated mount request from 129.115.117.22:723 for /lustre/home/qyu926 (/lustre/home) Nov 22 12:38:38 cajal mountd[5959]: /lustre/home and /home have same filehandle for 129.115.0.0/255.255.0.0, using first Nov 22 12:40:20 cajal rpc.idmapd[3669]: nss_getpwnam: name '500' does not map into domain 'cbi.utsa.edu' Nov 22 12:41:23 cajal kernel: LustreError: 5466:0:(llite_nfs.c:96:search_inode_for_lustre()) failure -2 inode 565846402 Nov 22 12:41:23 cajal kernel: LustreError: 5466:0:(llite_nfs.c:96:search_inode_for_lustre()) Skipped 503 previous similar messages This is the last entry before system reboots and you get the normal kernel boot messages This is what I see on 192.168.5.101 Nov 22 12:25:22 data2 kernel: LustreError: 4726:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Error -104 reading HELLO from 129.115.117.8 Nov 22 12:25:22 data2 kernel: LustreError: 4726:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Skipped 8 previous similar messages Nov 22 12:36:02 data2 kernel: LustreError: 4725:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Error -104 reading HELLO from 129.115.117.8 Nov 22 12:36:02 data2 kernel: LustreError: 4725:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Skipped 9 previous similar messages Nov 22 12:43:39 data2 kernel: Lustre: 23762:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1351344462337868 sent from lustre-OST0002 to NID 129.115.11...@tcp 7s ago has timed out (7s prior to deadline). Nov 22 12:43:39 data2 kernel: r...@ffff81004c42d800 x1351344462337868/t0 o104->@NET_0x2000081737508_UUID:15/16 lens 296/384 e 0 to 1 dl 1290451419 ref 1 fl Rpc:N/0/0 rc 0/0 Nov 22 12:43:39 data2 kernel: LustreError: 138-a: lustre-OST0002: A client on nid 129.115.11...@tcp was evicted due to a lock blocking callback to 129.115.11...@tcp timed out: rc -107 Nov 22 12:44:38 data2 kernel: Lustre: 23569:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1351344462337882 sent from lustre-OST0003 to NID 129.115.11...@tcp 0s ago has failed due to network error (7s prior to deadline). Nov 22 12:44:38 data2 kernel: r...@ffff810111b38400 x1351344462337882/t0 o104->@NET_0x2000081737508_UUID:15/16 lens 296/384 e 0 to 1 dl 1290451485 ref 1 fl Rpc:N/0/0 rc 0/0 Nov 22 12:44:38 data2 kernel: LustreError: 138-a: lustre-OST0003: A client on nid 129.115.11...@tcp was evicted due to a lock blocking callback to 129.115.11...@tcp timed out: rc -107 Whats going on? Thanks David -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
