The Log of that particular volume says: [2014-02-18 09:43:17.136182] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2014-02-18 09:43:17.136285] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2014-02-18 09:43:18.343409] I [server-handshake.c:571:server_setvolume] 0-teoswitch_default_storage-server: accepted client from xxxxx55.domain.com-2075-2014/02/18-09:43:14:302234-teoswitch_default_storage-client-1-0 (version: 3.3.0) [2014-02-18 09:43:21.356302] I [server-handshake.c:571:server_setvolume] 0-teoswitch_default_storage-server: accepted client from xxxxx54. domain.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 (version: 3.3.0) [2014-02-18 10:38:26.488333] W [socket.c:195:__socket_rwv] 0-tcp.teoswitch_default_storage-server: readv failed (Connection timed out) [2014-02-18 10:38:26.488431] I [server.c:685:server_rpc_notify] 0-teoswitch_default_storage-server: disconnecting connectionfrom xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 [2014-02-18 10:38:26.488494] I [server-helpers.c:741:server_connection_put] 0-teoswitch_default_storage-server: Shutting down connection xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 [2014-02-18 10:38:26.488541] I [server-helpers.c:629:server_connection_destroy] 0-teoswitch_default_storage-server: destroyed connection of xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0
When I try to access the folder I get. [root@hxteo55 ~]# ll /<path> /1001/voicemail/ ls: /<path>/1001/voicemail/: Input/output errorĀ This is the volume info: Volume Name: teoswitch_default_storage Type: Distribute Volume ID: 83c9d6f3-0288-4358-9fdc-b1d062cc8fca Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 12.12.123.54:/<path>/gluster/36779974/teoswitch_default_storage Brick2: 12.12.123.55:/<path>/gluster/36779974/teoswitch_default_storage Any ideas? Marco Zanger Phone 54 11 5299-5400 (int. 5501) Clay 2954, C1426DLD, Buenos Aires, Argentina Think Green - Please do not print this email unless you really need to -----Original Message----- From: Vijay Bellur [mailto:[email protected]] Sent: martes, 18 de febrero de 2014 03:56 a.m. To: Marco Zanger; [email protected] Subject: Re: [Gluster-users] Node down and volumes unreachable On 02/17/2014 11:19 PM, Marco Zanger wrote: > Read/write operations hang for long period of time (too long). I've > seen it in that state (waiting) for something like 5 minutes, which > makes every application fail trying to read or write. These are the > Errors I found in the logs in the server A which is still accessible > (B was down) > > etc-glusterfs-glusterd.vol.log > > ... > [2014-01-31 07:56:49.780247] W > [socket.c:1512:__socket_proto_state_machine] 0-management: reading > from socket failed. Error (Connection timed out), peer > (<SERVER_B_IP>:24007) > [2014-01-31 07:58:25.965783] E [socket.c:1715:socket_connect_finish] > 0-management: connection to <SERVER_B_IP>:24007 failed (No route to > host) > [2014-01-31 08:59:33.923250] I > [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using > Program glusterd mgmt, Num (1238433), Version (2) > [2014-01-31 08:59:33.923289] I > [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program > Peer mgmt, Num (1238437), Version (2) ... > > > glustershd.log > > [2014-01-27 12:07:03.644849] W > [socket.c:1512:__socket_proto_state_machine] > 0-teoswitch_custom_music-client-1: reading from socket failed. Error > (Connection timed out), peer (<SERVER_B_IP>:24010) > [2014-01-27 12:07:03.644888] I [client.c:2090:client_rpc_notify] > 0-teoswitch_custom_music-client-1: disconnected > [2014-01-27 12:09:35.553628] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_greetings-client-1: connection to <SERVER_B_IP>:24011 > failed (Connection timed out) > [2014-01-27 12:10:13.588148] E [socket.c:1715:socket_connect_finish] > 0-license_path-client-1: connection to <SERVER_B_IP>:24013 failed > (Connection timed out) > [2014-01-27 12:10:15.593699] E [socket.c:1715:socket_connect_finish] > 0-upload_path-client-1: connection to <SERVER_B_IP>:24009 failed > (Connection timed out) > [2014-01-27 12:10:21.601670] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_ivr_greetings-client-1: connection to <SERVER_B_IP>:24012 > failed (Connection timed out) > [2014-01-27 12:10:23.607312] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 > failed (Connection timed out) > [2014-01-27 12:11:21.866604] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_ivr_greetings-replicate-0: Stopping crawl as < 2 children > are up > [2014-01-27 12:11:21.867874] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_greetings-replicate-0: Stopping crawl as < 2 children are > up > [2014-01-27 12:11:21.868134] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_custom_music-replicate-0: Stopping crawl as < 2 children > are up > [2014-01-27 12:11:21.869417] E [afr-self-heald.c:418:_crawl_proceed] > 0-license_path-replicate-0: Stopping crawl as < 2 children are up > [2014-01-27 12:11:21.869659] E [afr-self-heald.c:418:_crawl_proceed] > 0-upload_path-replicate-0: Stopping crawl as < 2 children are up > [2014-01-27 12:12:53.948154] I > [client-handshake.c:1636:select_server_supported_programs] > 0-teoswitch_greetings-client-1: Using Program GlusterFS 3.3.0, Num > (1298437), Version (330) > [2014-01-27 12:12:53.952894] I > [client-handshake.c:1433:client_setvolume_cbk] > 0-teoswitch_greetings-client-1: Connected to <SERVER_B_IP>:24011, > attached to remote volume > > nfs.log there are lots of errors but the one that insist most Is this: > > [2014-01-27 12:12:27.136033] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 > failed (Connection timed out) > > Any ideas? From the logs I see nothing but confirm the fact that A cannot > reach B which makes sense since B is down. But A is not, and it's volume > should still be accesible. Right? Nothing very obvious from these logs. Can you share relevant portions of the client log file? Usually the name of the mount point would be a part of the client log file. -Vijay _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
