can you please share the glusterd log file? On Thu, Jun 15, 2017 at 5:18 PM, Guy Cukierman <[email protected]> wrote:
> Hi, > > I’m having a similar issue, were you able to solve it? > > Thanks. > > > > > > > > Hey all, > > > > I've got a strange problem going on here. I've installed glusterfs-server > > on ubuntu 16.04: > > glusterfs-client/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic] > > glusterfs-common/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic] > > glusterfs-server/xenial,now 3.7.6-1ubuntu1 amd64 [installed] > > > > I can successfully probe another peer at this point. Then, after installing > > kubernetes via kargo, peer probing begins failing with a timeout. I've > > tried stopping all kubernetes related services, and flushing all iptables > > rules, however I don't see any packets leaving any interface when > > attempting to peer probe. > > > > from cli.log: > > [2017-04-03 22:20:24.704900] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 1 > > [2017-04-03 22:20:24.704973] T [cli.c:273:cli_rpc_notify] 0-glusterfs: got > > RPC_CLNT_CONNECT > > [2017-04-03 22:20:24.705001] T [cli-quotad-client.c:94:cli_quotad_notify] > > 0-glusterfs: got RPC_CLNT_CONNECT > > [2017-04-03 22:20:24.705014] I [socket.c:2355:socket_event_handler] > > 0-transport: disconnecting now > > [2017-04-03 22:20:24.705204] T [rpc-clnt.c:1404:rpc_clnt_record] > > 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner: > > [2017-04-03 22:20:24.705256] T > > [rpc-clnt.c:1261:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen > > 156, payload: 92, rpc hdr: 64 > > [2017-04-03 22:20:24.705662] T [socket.c:2879:socket_connect] (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[ > 0x7f012fd21953] > > (--> /usr/lib/x86_64-linux-gnu > > /libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4] > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[ > 0x7f012f697af5] > > (--> /usr/lib/x8 > > 6_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338] (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_ > notify+0x23)[0x7f012f6945b3] > > ))))) 0-glusterfs: connect > > () called on transport already connected > > [2017-04-03 22:20:24.705680] D > > [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[ > 0x7f012fd21953] > > (--> / > > usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ > ping_timer_locked+0x84)[0x7f012f69add4] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[ > 0x7f012f > > 697af5] (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+ > 0x88)[0x7f012f698338] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_ > notify+0x23)[0x7f012f6945b3] > > ))) > > )) 0-: /var/run/gluster/quotad.socket: ping timer event already removed > > [2017-04-03 22:20:24.705710] T [cli-quotad-client.c:100:cli_quotad_notify] > > 0-glusterfs: got RPC_CLNT_DISCONNECT > > [2017-04-03 22:20:24.705718] T [rpc-clnt.c:1598:rpc_clnt_submit] > > 0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2, > > Proc: 1) to rpc-transport (glusterfs) > > [2017-04-03 22:20:24.705739] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping] > > 0-glusterfs: ping timeout is 0, returning > > [2017-04-03 22:20:24.705723] D [MSGID: 0] > > [event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation > bumped > > on idx=1 from gen=1 to slot->gen=2, fd=7, slot->fd=7 > > [2017-04-03 22:20:27.614881] T [rpc-clnt.c:418:rpc_clnt_reconnect] > > 0-glusterfs: attempting reconnect > > [2017-04-03 22:20:27.615151] T [socket.c:2879:socket_connect] (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[ > 0x7f012fd21953] > > (--> > > /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/ > socket.so(+0x6c1b)[0x7f012a697c1b] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[ > 0x7f012f695999] > > (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_ > proc+0xfc)[0x7f012fd3d70c] > > (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] ))))) > > 0-glusterfs: connect () called on transport already connected > > > > it then repeats the following: > > [2017-04-03 22:20:27.615177] T [rpc-clnt.c:418:rpc_clnt_reconnect] > > 0-glusterfs: attempting reconnect > > [2017-04-03 22:20:27.615188] T [socket.c:2887:socket_connect] 0-glusterfs: > > connecting 0x25d3550, state=0 gen=0 sock=-1 > > [2017-04-03 22:20:27.615200] T > > [name.c:295:af_unix_client_get_remote_sockaddr] 0-glusterfs: using > > connect-path /var/run/gluster/quotad.socket > > [2017-04-03 22:20:27.615218] T [name.c:111:af_unix_client_bind] > > 0-glusterfs: bind-path not specified for unix socket, letting connect to > > assign default value > > [2017-04-03 22:20:27.615329] T [cli-quotad-client.c:94:cli_quotad_notify] > > 0-glusterfs: got RPC_CLNT_CONNECT > > [2017-04-03 22:20:27.615355] I [socket.c:2355:socket_event_handler] > > 0-transport: disconnecting now > > [2017-04-03 22:20:27.615567] D > > [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[ > 0x7f012fd21953] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ > ping_timer_locked+0x84)[0x7f012f69add4] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[ > 0x7f012f697af5] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+ > 0x88)[0x7f012f698338] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_ > notify+0x23)[0x7f012f6945b3] > > ))))) 0-: /var/run/gluster/quotad.socket: ping timer event already removed > > [2017-04-03 22:20:27.615598] T [cli-quotad-client.c:100:cli_quotad_notify] > > 0-glusterfs: got RPC_CLNT_DISCONNECT > > [2017-04-03 22:20:27.615612] D [MSGID: 0] > > [event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation > bumped > > on idx=1 from gen=4 to slot->gen=5, fd=7, slot->fd=7 > > [2017-04-03 22:20:30.615864] T [rpc-clnt.c:418:rpc_clnt_reconnect] > > 0-glusterfs: attempting reconnect > > [2017-04-03 22:20:30.616113] T [socket.c:2879:socket_connect] (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[ > 0x7f012fd21953] > > (--> > > /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/ > socket.so(+0x6c1b)[0x7f012a697c1b] > > (--> > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[ > 0x7f012f695999] > > (--> > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_ > proc+0xfc)[0x7f012fd3d70c] > > (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] ))))) > > 0-glusterfs: connect () called on transport already connected > > > > and from etc-glusterfs-glusterd.vol.log i see the probe request and then: > > [2017-04-03 22:20:24.707258] D [MSGID: 0] > > [glusterd-peer-utils.c:133:glusterd_peerinfo_find_by_hostname] > > 0-management: Unable to find friend: 10.1.0.7 > > [2017-04-03 22:20:24.707260] D [MSGID: 0] > > [glusterd-peer-utils.c:247:glusterd_peerinfo_find] 0-management: Unable to > > find hostname: 10.1.0.7 > > [2017-04-03 22:20:24.707293] I [MSGID: 106129] > > [glusterd-handler.c:3600:glusterd_probe_begin] 0-glusterd: Unable to find > > peerinfo for host: 10.1.0.7 (24007) > > [2017-04-03 22:20:24.707314] D [MSGID: 0] > > [glusterd-peer-utils.c:458:glusterd_peer_hostname_new] 0-glusterd: > > Returning 0 > > [2017-04-03 22:20:24.707329] D [MSGID: 0] > > [glusterd-utils.c:5974:glusterd_sm_tr_log_init] 0-glusterd: returning 0 > > [2017-04-03 22:20:24.707353] D [MSGID: 0] > > [glusterd-store.c:3802:glusterd_store_create_peer_dir] 0-glusterd: > > Returning with 0 > > [2017-04-03 22:20:24.728550] D [MSGID: 0] [store.c:421:gf_store_handle_ > new] > > 0-: Returning 0 > > [2017-04-03 22:20:24.728679] D [MSGID: 0] [store.c:375:gf_store_save_ > value] > > 0-management: returning: 0 > > [2017-04-03 22:20:24.728720] D [logging.c:1952:_gf_msg_internal] > > 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About > > to flush least recently used log message to disk > > The message "D [MSGID: 0] [store.c:375:gf_store_save_value] 0-management: > > returning: 0" repeated 2 times between [2017-04-03 22:20:24.728679] and > > [2017-04-03 22:20:24.728718] > > [2017-04-03 22:20:24.728720] D [MSGID: 0] > > [glusterd-store.c:3931:glusterd_store_peer_write] 0-glusterd: Returning > > with 0 > > [2017-04-03 22:20:24.786973] D [MSGID: 0] > > [glusterd-store.c:3957:glusterd_store_perform_peer_store] 0-glusterd: > > Returning 0 > > [2017-04-03 22:20:24.787007] D [MSGID: 0] > > [glusterd-store.c:3978:glusterd_store_peerinfo] 0-glusterd: Returning > with 0 > > [2017-04-03 22:20:24.787046] D [MSGID: 0] > > [glusterd-handler.c:3391:glusterd_transport_inet_options_build] > 0-glusterd: > > Returning 0 > > [2017-04-03 22:20:24.787077] I [rpc-clnt.c:984:rpc_clnt_connection_init] > > 0-management: setting frame-timeout to 600 > > [2017-04-03 22:20:24.787090] D [rpc-clnt.c:996:rpc_clnt_connection_init] > > 0-management: setting ping-timeout to 30 > > [2017-04-03 22:20:24.787106] D [rpc-transport.c:288:rpc_transport_load] > > 0-rpc-transport: attempt to load file > > /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so > > [2017-04-03 22:20:24.787147] T [MSGID: 0] > > [options.c:91:xlator_option_validate_int] 0-management: no range check > > required for 'option transport.tcp-user-timeout -1' > > [2017-04-03 22:20:24.787166] T [MSGID: 0] > > [options.c:91:xlator_option_validate_int] 0-management: no range check > > required for 'option transport.socket.keepalive-time 10' > > [2017-04-03 22:20:24.787182] T [MSGID: 0] > > [options.c:91:xlator_option_validate_int] 0-management: no range check > > required for 'option transport.socket.keepalive-interval 2' > > [2017-04-03 22:20:24.787203] T [MSGID: 0] > > [options.c:91:xlator_option_validate_int] 0-management: no range check > > required for 'option remote-port 24007' > > [2017-04-03 22:20:24.787221] D [socket.c:3845:socket_init] 0-management: > > Configued transport.tcp-user-timeout=-1 > > [2017-04-03 22:20:24.787233] D [socket.c:3928:socket_init] 0-management: > > SSL support on the I/O path is NOT enabled > > [2017-04-03 22:20:24.787242] D [socket.c:3931:socket_init] 0-management: > > SSL support for glusterd is NOT enabled > > [2017-04-03 22:20:24.787251] D [socket.c:3948:socket_init] 0-management: > > using system polling thread > > [2017-04-03 22:20:24.787263] T [rpc-clnt.c:418:rpc_clnt_reconnect] > > 0-management: attempting reconnect > > [2017-04-03 22:20:24.787273] T [socket.c:2887:socket_connect] 0-management: > > connecting 0x7ff7fc005640, state=0 gen=0 sock=-1 > > [2017-04-03 22:20:24.787288] T [MSGID: 0] > > [common-utils.c:290:gf_resolve_ip6] 0-resolver: DNS cache not present, > > freshly probing hostname: 10.1.0.7 > > [2017-04-03 22:20:24.787843] D [MSGID: 0] > > [common-utils.c:333:gf_resolve_ip6] 0-resolver: returning ip-10.1.0.7 > > (port-24007) for hostname: 10.1.0.7 and port: 24007 > > [2017-04-03 22:20:24.787863] D [socket.c:2833:socket_fix_ssl_opts] > > 0-management: disabling SSL for portmapper connection > > [2017-04-03 22:20:24.787883] T [socket.c:805:__socket_nodelay] > > 0-management: NODELAY enabled for socket 8 > > [2017-04-03 22:20:24.787899] W [socket.c:869:__socket_keepalive] 0-socket: > > failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument > > [2017-04-03 22:20:24.787910] E [socket.c:2965:socket_connect] 0-management: > > Failed to set keep-alive: Invalid argument > > [2017-04-03 22:20:24.788092] D [MSGID: 0] > > [common-utils.c:2887:gf_ports_reserved] 0-glusterfs: lower: 30000, higher: > > 32767 > > [2017-04-03 22:21:52.911012] D [logging.c:1828:gf_log_flush_timeout_cbk] > > 0-logging-infra: Log timer timed out. About to flush outstanding messages > > if present > > [2017-04-03 22:21:52.911131] D [logging.c:1790:__gf_log_ > inject_timer_event] > > 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 > > [2017-04-03 22:22:24.707002] D [socket.c:558:__socket_rwv] > > 0-socket.management: EOF on socket > > [2017-04-03 22:22:24.707032] D [socket.c:2355:socket_event_handler] > > 0-transport: disconnecting now > > > > I've debugged this to the best of my ability but haven't made any progress > > narrowing it down, nor has google turned up anything useful. > > > > Thanks for the help! > > > > Ken > > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
