On 02/17/2014 11:19 PM, Marco Zanger wrote:
Read/write operations hang for long period of time (too long). I've seen it in 
that state (waiting) for something like 5 minutes, which makes every 
application fail trying to read or write. These are the Errors I found in the 
logs in the server A which is still accessible (B was down)

etc-glusterfs-glusterd.vol.log

...
  [2014-01-31 07:56:49.780247] W [socket.c:1512:__socket_proto_state_machine] 
0-management: reading from socket failed. Error (Connection timed out), peer 
(<SERVER_B_IP>:24007)
[2014-01-31 07:58:25.965783] E [socket.c:1715:socket_connect_finish] 0-management: 
connection to <SERVER_B_IP>:24007 failed (No route to host)
[2014-01-31 08:59:33.923250] I 
[glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using Program 
glusterd mgmt, Num (1238433), Version (2)
[2014-01-31 08:59:33.923289] I 
[glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program 
Peer mgmt, Num (1238437), Version (2)
...


glustershd.log

[2014-01-27 12:07:03.644849] W [socket.c:1512:__socket_proto_state_machine] 
0-teoswitch_custom_music-client-1: reading from socket failed. Error (Connection 
timed out), peer (<SERVER_B_IP>:24010)
[2014-01-27 12:07:03.644888] I [client.c:2090:client_rpc_notify] 
0-teoswitch_custom_music-client-1: disconnected
[2014-01-27 12:09:35.553628] E [socket.c:1715:socket_connect_finish] 
0-teoswitch_greetings-client-1: connection to <SERVER_B_IP>:24011 failed 
(Connection timed out)
[2014-01-27 12:10:13.588148] E [socket.c:1715:socket_connect_finish] 
0-license_path-client-1: connection to <SERVER_B_IP>:24013 failed (Connection 
timed out)
[2014-01-27 12:10:15.593699] E [socket.c:1715:socket_connect_finish] 
0-upload_path-client-1: connection to <SERVER_B_IP>:24009 failed (Connection 
timed out)
[2014-01-27 12:10:21.601670] E [socket.c:1715:socket_connect_finish] 
0-teoswitch_ivr_greetings-client-1: connection to <SERVER_B_IP>:24012 failed 
(Connection timed out)
[2014-01-27 12:10:23.607312] E [socket.c:1715:socket_connect_finish] 
0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 failed 
(Connection timed out)
[2014-01-27 12:11:21.866604] E [afr-self-heald.c:418:_crawl_proceed] 
0-teoswitch_ivr_greetings-replicate-0: Stopping crawl as < 2 children are up
[2014-01-27 12:11:21.867874] E [afr-self-heald.c:418:_crawl_proceed] 
0-teoswitch_greetings-replicate-0: Stopping crawl as < 2 children are up
[2014-01-27 12:11:21.868134] E [afr-self-heald.c:418:_crawl_proceed] 
0-teoswitch_custom_music-replicate-0: Stopping crawl as < 2 children are up
[2014-01-27 12:11:21.869417] E [afr-self-heald.c:418:_crawl_proceed] 
0-license_path-replicate-0: Stopping crawl as < 2 children are up
[2014-01-27 12:11:21.869659] E [afr-self-heald.c:418:_crawl_proceed] 
0-upload_path-replicate-0: Stopping crawl as < 2 children are up
[2014-01-27 12:12:53.948154] I 
[client-handshake.c:1636:select_server_supported_programs] 
0-teoswitch_greetings-client-1: Using Program GlusterFS 3.3.0, Num (1298437), 
Version (330)
[2014-01-27 12:12:53.952894] I [client-handshake.c:1433:client_setvolume_cbk] 
0-teoswitch_greetings-client-1: Connected to <SERVER_B_IP>:24011, attached to 
remote volume

nfs.log  there are lots of errors but the one that insist most Is this:

[2014-01-27 12:12:27.136033] E [socket.c:1715:socket_connect_finish] 
0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 failed 
(Connection timed out)

Any ideas? From the logs I see nothing but confirm the fact that A cannot reach 
B which makes sense since B is down. But A is not, and it's volume should still 
be accesible. Right?

Nothing very obvious from these logs.

Can you share relevant portions of the client log file? Usually the name of the mount point would be a part of the client log file.

-Vijay

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to