I replaced the brick in a node in my 3x2 dist+repl volume (RHS 3). I'm seeing that the heal process, which should essentially be a dump from the working replica to the newly added one is taking exceptionally long. It has moved ~100 G over a day on a 1Gigabit network. The CPU usage on both the nodes of the replica has been pretty high. I also think that nagios is making it worse. The heal is slow enough as it is, and nagios keeps triggering heal info, which I think never completes. I also see my logs filling up These are some of the log contents which I got by running tail on them:
cli.log [2015-08-06 19:52:20.926000] T [socket.c:2759:socket_connect] (-->/lib64/libpthread.so.0() [0x3ec1407a51] (-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0x120) [0x7fb84c0f6980] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xd9) [0x7fb84bc96249]))) 0-glusterfs: connect () called on transport already connected [2015-08-06 19:52:21.926068] T [rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect [2015-08-06 19:52:21.926091] T [socket.c:2767:socket_connect] 0-glusterfs: connecting 0xa198b0, state=0 gen=0 sock=-1 [2015-08-06 19:52:21.926114] W [dict.c:1060:data_to_str] (-->/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x6bea) [0x7fb844f82bea] (-->/usr/lib64/glusterfs/ 3.6.0.53/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7fb844f873bd] (-->/usr/lib64/glusterfs/ 3.6.0.53/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7fb844f87270]))) 0-dict: data is NULL [2015-08-06 19:52:21.926125] W [dict.c:1060:data_to_str] (-->/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x6bea) [0x7fb844f82bea] (-->/usr/lib64/glusterfs/ 3.6.0.53/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) [0x7fb844f873bd] (-->/usr/lib64/glusterfs/ 3.6.0.53/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7fb844f8727b]))) 0-dict: data is NULL [2015-08-06 19:52:21.926129] E [name.c:140:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2015-08-06 19:52:21.926179] T [cli-quotad-client.c:100:cli_quotad_notify] 0-glusterfs: got RPC_CLNT_DISCONNECT brick log full of these messages: [2015-08-06 19:54:22.494254] I [server-rpc-fops.c:693:server_removexattr_cbk] 0-gluster-server: 2206495: REMOVEXATTR file path (fadccb1e-ea0c-416a-94ec-ec88fafec2a5) of key security.ima ==> (No data available) [2015-08-06 19:54:22.514814] E [marker.c:2574:marker_removexattr_cbk] 0-gluster-marker: No data available occurred while creating symlinks sestatus SELinux status: disabled glusterfs --version glusterfs 3.6.0.53 built on Mar 18 2015 08:12:38 Does anyone know what's going on ? PS: I am using RHS because our school's satellite has the repos. Contacting RHN over this would likely be complicated, and i would prefer solving this on my own.
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
