The "Disconnected" state of nodes randomly changes, so I randomly picked a node and tailed last several lines of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (is it the right log file?).
I can still access the cluster from servers already in pool, either reading or writing is fine. The log shows a log of "Failed to set keep-alive: Protocol not available": Thanks. [2015-08-31 09:38:25.586073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:27.193523] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf [2015-08-31 09:38:27.209085] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:27.370367] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server 10.88.153.23:24007 has not responded in the last 30 seconds, disconnecting. [2015-08-31 09:38:28.803311] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57 [2015-08-31 09:38:28.818834] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend The message "I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928" repeated 4 times between [2015-08-31 09:36:30.776194] and [2015-08-31 09:38:06.162677] The message "I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133" repeated 4 times between [2015-08-31 09:36:32.404743] and [2015-08-31 09:38:07.779594] [2015-08-31 09:38:30.419141] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server62.yq01.local.net> (<3d354922-4bcd-4469-9e2e-559067882217>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:30.419188] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server52.yq01.local.net> (<6466759d-05eb-406e-9ede-a36dbf26c216>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:30.419299] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133 [2015-08-31 09:38:30.434835] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:32.035177] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 4db788d9-d372-4f57-a0f4-ba11d480013d [2015-08-31 09:38:33.373803] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available [2015-08-31 09:38:33.373821] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:33.376719] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 70, Protocol not available [2015-08-31 09:38:33.376735] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:32.050834] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:33.651240] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8 [2015-08-31 09:38:33.666825] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:35.267184] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:35.267237] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid argument) [2015-08-31 09:38:35.267253] I [MSGID: 106006] [glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-08-31 09:38:35.267352] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775 [2015-08-31 09:38:35.282829] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:36.877526] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-08-31 09:37:43.506542 (xid=0x1535) [2015-08-31 09:38:36.877553] E [MSGID: 106167] [glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later [2015-08-31 09:38:36.877643] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-31 09:37:43.506554 (xid=0x1536) [2015-08-31 09:38:36.877659] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] 0-management: socket disconnected [2015-08-31 09:38:36.877676] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server6.yq01.local.net> (<eb491a24-3edd-494a-90c0-b4280bd6995e>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:36.877823] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] ))))) 0-management: Lock for vol speech0 not held [2015-08-31 09:38:36.877840] W [MSGID: 106118] [glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not released for speech0 [2015-08-31 09:38:36.877889] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server48.yq01.local.net> (<372c820d-003e-4885-870c-547ca17f6770>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:36.878012] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a [2015-08-31 09:38:36.893088] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:37.380052] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available [2015-08-31 09:38:37.380071] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:38.492491] W [socket.c:642:__socket_rwv] 0-socket.management: writev on 10.88.155.28:65379 failed (Broken pipe) [2015-08-31 09:38:38.492510] I [socket.c:2409:socket_event_handler] 0-transport: disconnecting now [2015-08-31 09:38:38.492565] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 5, Protocol not available [2015-08-31 09:38:38.492576] W [socket.c:2673:socket_server_event_handler] 0-socket.management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:38.492669] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < worker09.yq01.local.net> (<c0f4eab2-9cdd-4ba8-a002-259456288fd3>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:38.492715] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server53.yq01.local.net> (<b1f15cce-36e4-4ef4-a22f-70bafb0bf8d3>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:38.492786] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 96aa9f85-f979-42a8-ac0a-1136384fbc14 [2015-08-31 09:38:38.508078] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:39.383260] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 27, Protocol not available [2015-08-31 09:38:39.383280] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:40.108404] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 72e2074f-921d-45d6-9601-deee653075a9 [2015-08-31 09:38:40.124073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:41.386485] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 23, Protocol not available [2015-08-31 09:38:41.386506] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:41.389473] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 30, Protocol not available [2015-08-31 09:38:41.389486] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:41.733507] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f1c1b3d9-326d-4730-b1b0-788690da2ce1 [2015-08-31 09:38:41.749079] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:43.348570] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 455da276-9ef5-46ab-90f9-457a70432224 [2015-08-31 09:38:43.364074] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:44.964456] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server43.yq01.local.net> (<76cb46d9-5669-47db-b264-68b55d4c37f0>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:44.964578] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 00d5caae-b647-4dae-8d3e-df1e7f08941f [2015-08-31 09:38:44.980073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:45.392805] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 38, Protocol not available [2015-08-31 09:38:45.392825] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:46.393009] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server 10.88.155.15:24007 has not responded in the last 30 seconds, disconnecting. [2015-08-31 09:38:46.584515] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e204bc20-9c4f-449c-9dfc-f6e54b96bf8c [2015-08-31 09:38:46.600079] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:47.396000] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 35, Protocol not available [2015-08-31 09:38:47.396019] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:48.198525] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 607e3f7a-65e6-423a-9226-5f763f9838e8 [2015-08-31 09:38:48.214089] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:49.815541] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e2322b18-2e5f-4c3c-8cc2-84b137fa7328 [2015-08-31 09:38:49.831078] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:51.434550] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-08-31 09:37:56.464514 (xid=0x1315) [2015-08-31 09:38:51.434579] E [MSGID: 106167] [glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later [2015-08-31 09:38:51.434669] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-31 09:37:56.464526 (xid=0x1316) [2015-08-31 09:38:51.434685] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] 0-management: socket disconnected [2015-08-31 09:38:51.434704] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server42.yq01.local.net> (<0b24198f-dfad-4259-bc22-9f3736f53824>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:38:51.434850] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] ))))) 0-management: Lock for vol speech0 not held [2015-08-31 09:38:51.434867] W [MSGID: 106118] [glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not released for speech0 [2015-08-31 09:38:51.434994] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf [2015-08-31 09:38:51.450075] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:53.049543] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928 [2015-08-31 09:38:53.065083] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:54.666534] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57 [2015-08-31 09:38:54.682066] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:57.399884] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 45, Protocol not available [2015-08-31 09:38:57.399906] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:57.402816] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available [2015-08-31 09:38:57.402830] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available [2015-08-31 09:38:56.301076] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:57.897551] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8 [2015-08-31 09:38:57.913072] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:38:59.513520] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775 [2015-08-31 09:38:59.529073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:39:01.129419] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer < server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state <Peer in Cluster>, has disconnected from glusterd. [2015-08-31 09:39:01.129469] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid argument) [2015-08-31 09:39:01.129484] I [MSGID: 106006] [glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-08-31 09:39:01.129587] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a [2015-08-31 09:39:01.145074] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-08-31 09:39:01.406146] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available [2015-08-31 09:39:01.406168] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available 2015-08-31 16:54 GMT+08:00 Atin Mukherjee <[email protected]>: > > > On 08/31/2015 01:10 PM, Yiping Peng wrote: > > Hi guys, > > > > > > I've been running GlusterFS for a couple of days and it's been nice and > > steady, except a minor problem: the peer probing on my relatively large > > cluster seems to stuck for a long time. > > > > > > Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster > as > > large as 50+ nodes might take a long time peer probing (o(n^2) time), and > > now my cluster has expanded to 90+ nodes. > > > > > > The peer probing process was started 4 days ago, when my cluster had ~50 > > nodes. I probed ~40 nodes using subprocess in bash at once, and the > > commands all successfully returned almost immediately (no time-outs). > > > > > > However the glusterd kept writing to /var/lib/glusterd/peers/ during the > > last 4 days, and all commands related to newly-added nodes, e.g. > add-brick, > > mount, will time-out and fail. Also, running “gluster peer status” on my > > nodes shows “Disconnected” nodes that varies over time. > Peer status should not shows node in disconnected state even if the peer > handshaking takes longer time, if it does then something is wrong. Could > you check which node is disconnected and what the glusterd log file on > that node indicates? > > > > > > What shall I do in such situation? Do I need to wait for the whole peer > > probing progress to complete, or can I simply kill the glusterd and > restart > > it? > > > > > > Regards, > > > > Yiping Peng > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-users > > >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
