The Log of that particular volume says:

[2014-02-18 09:43:17.136182] W [socket.c:410:__socket_keepalive] 0-socket: 
failed to set keep idle on socket 8
[2014-02-18 09:43:17.136285] W [socket.c:1876:socket_server_event_handler] 
0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2014-02-18 09:43:18.343409] I [server-handshake.c:571:server_setvolume] 
0-teoswitch_default_storage-server: accepted client from 
xxxxx55.domain.com-2075-2014/02/18-09:43:14:302234-teoswitch_default_storage-client-1-0
 (version: 3.3.0)
[2014-02-18 09:43:21.356302] I [server-handshake.c:571:server_setvolume] 
0-teoswitch_default_storage-server: accepted client from xxxxx54. 
domain.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 
(version: 3.3.0)
[2014-02-18 10:38:26.488333] W [socket.c:195:__socket_rwv] 
0-tcp.teoswitch_default_storage-server: readv failed (Connection timed out)
[2014-02-18 10:38:26.488431] I [server.c:685:server_rpc_notify] 
0-teoswitch_default_storage-server: disconnecting connectionfrom 
xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0
[2014-02-18 10:38:26.488494] I [server-helpers.c:741:server_connection_put] 
0-teoswitch_default_storage-server: Shutting down connection 
xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0
[2014-02-18 10:38:26.488541] I [server-helpers.c:629:server_connection_destroy] 
0-teoswitch_default_storage-server: destroyed connection of 
xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0

When I try to access the folder I get.

[root@hxteo55 ~]# ll /<path> /1001/voicemail/
ls: /<path>/1001/voicemail/: Input/output errorĀ 

This is the volume info:

Volume Name: teoswitch_default_storage
Type: Distribute
Volume ID: 83c9d6f3-0288-4358-9fdc-b1d062cc8fca
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 12.12.123.54:/<path>/gluster/36779974/teoswitch_default_storage
Brick2: 12.12.123.55:/<path>/gluster/36779974/teoswitch_default_storage

Any ideas?


Marco Zanger
Phone 54 11 5299-5400 (int. 5501)
Clay 2954, C1426DLD, Buenos Aires, Argentina
Think Green - Please do not print this email unless you really need to


-----Original Message-----
From: Vijay Bellur [mailto:[email protected]] 
Sent: martes, 18 de febrero de 2014 03:56 a.m.
To: Marco Zanger; [email protected]
Subject: Re: [Gluster-users] Node down and volumes unreachable

On 02/17/2014 11:19 PM, Marco Zanger wrote:
> Read/write operations hang for long period of time (too long). I've 
> seen it in that state (waiting) for something like 5 minutes, which 
> makes every application fail trying to read or write. These are the 
> Errors I found in the logs in the server A which is still accessible 
> (B was down)
>
> etc-glusterfs-glusterd.vol.log
>
> ...
>   [2014-01-31 07:56:49.780247] W 
> [socket.c:1512:__socket_proto_state_machine] 0-management: reading 
> from socket failed. Error (Connection timed out), peer 
> (<SERVER_B_IP>:24007)
> [2014-01-31 07:58:25.965783] E [socket.c:1715:socket_connect_finish] 
> 0-management: connection to <SERVER_B_IP>:24007 failed (No route to 
> host)
> [2014-01-31 08:59:33.923250] I 
> [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using 
> Program glusterd mgmt, Num (1238433), Version (2)
> [2014-01-31 08:59:33.923289] I 
> [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program 
> Peer mgmt, Num (1238437), Version (2) ...
>
>
> glustershd.log
>
> [2014-01-27 12:07:03.644849] W 
> [socket.c:1512:__socket_proto_state_machine] 
> 0-teoswitch_custom_music-client-1: reading from socket failed. Error 
> (Connection timed out), peer (<SERVER_B_IP>:24010)
> [2014-01-27 12:07:03.644888] I [client.c:2090:client_rpc_notify] 
> 0-teoswitch_custom_music-client-1: disconnected
> [2014-01-27 12:09:35.553628] E [socket.c:1715:socket_connect_finish] 
> 0-teoswitch_greetings-client-1: connection to <SERVER_B_IP>:24011 
> failed (Connection timed out)
> [2014-01-27 12:10:13.588148] E [socket.c:1715:socket_connect_finish] 
> 0-license_path-client-1: connection to <SERVER_B_IP>:24013 failed 
> (Connection timed out)
> [2014-01-27 12:10:15.593699] E [socket.c:1715:socket_connect_finish] 
> 0-upload_path-client-1: connection to <SERVER_B_IP>:24009 failed 
> (Connection timed out)
> [2014-01-27 12:10:21.601670] E [socket.c:1715:socket_connect_finish] 
> 0-teoswitch_ivr_greetings-client-1: connection to <SERVER_B_IP>:24012 
> failed (Connection timed out)
> [2014-01-27 12:10:23.607312] E [socket.c:1715:socket_connect_finish] 
> 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 
> failed (Connection timed out)
> [2014-01-27 12:11:21.866604] E [afr-self-heald.c:418:_crawl_proceed] 
> 0-teoswitch_ivr_greetings-replicate-0: Stopping crawl as < 2 children 
> are up
> [2014-01-27 12:11:21.867874] E [afr-self-heald.c:418:_crawl_proceed] 
> 0-teoswitch_greetings-replicate-0: Stopping crawl as < 2 children are 
> up
> [2014-01-27 12:11:21.868134] E [afr-self-heald.c:418:_crawl_proceed] 
> 0-teoswitch_custom_music-replicate-0: Stopping crawl as < 2 children 
> are up
> [2014-01-27 12:11:21.869417] E [afr-self-heald.c:418:_crawl_proceed] 
> 0-license_path-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:11:21.869659] E [afr-self-heald.c:418:_crawl_proceed] 
> 0-upload_path-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:12:53.948154] I 
> [client-handshake.c:1636:select_server_supported_programs] 
> 0-teoswitch_greetings-client-1: Using Program GlusterFS 3.3.0, Num 
> (1298437), Version (330)
> [2014-01-27 12:12:53.952894] I 
> [client-handshake.c:1433:client_setvolume_cbk] 
> 0-teoswitch_greetings-client-1: Connected to <SERVER_B_IP>:24011, 
> attached to remote volume
>
> nfs.log  there are lots of errors but the one that insist most Is this:
>
> [2014-01-27 12:12:27.136033] E [socket.c:1715:socket_connect_finish] 
> 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 
> failed (Connection timed out)
>
> Any ideas? From the logs I see nothing but confirm the fact that A cannot 
> reach B which makes sense since B is down. But A is not, and it's volume 
> should still be accesible. Right?

Nothing very obvious from these logs.

Can you share relevant portions of the client log file? Usually the name of the 
mount point would be a part of the client log file.

-Vijay

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to