Hi,
I have a rebalance that has failed on one peer twice now. Rebalance logs below (directories anonomised and some irrelevant log lines cut). It looks like it loses connection to the brick, but immediately stops the rebalance on that peer instead of waiting for reconnection - which happens a second or so later. Is this normal behaviour? So far it has been the same server and the same (remote) brick. The brick shows a high number of disconnects compared to the other bricks on the same server ./export-md0-brick.log.1 2 ./export-md1-brick.log.1 2 ./export-md2-brick.log.1 181 ./export-md3-brick.log.1 2 Any clues? What could be causing this because there is nothing in the log to indicate cause. Steve gluster volume info video Volume Name: video Type: Distribute Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199 Status: Started Snapshot Count: 0 Number of Bricks: 9 Transport-type: tcp Bricks: Brick1: 10.0.0.31:/export/md0/brick Brick2: 10.0.0.32:/export/md0/brick Brick3: 10.0.0.31:/export/md1/brick Brick4: 10.0.0.32:/export/md1/brick Brick5: 10.0.0.31:/export/md2/brick Brick6: 10.0.0.32:/export/md2/brick Brick7: 10.0.0.31:/export/md3/brick Brick8: 10.0.0.32:/export/md3/brick Brick9: 10.0.0.33:/export/md0/brick Options Reconfigured: network.ping-timeout: 10 cluster.min-free-disk: 1% transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.rebal-throttle: lazy [2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv] 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) [2017-10-12 23:00:55.099709] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from video-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2017-10-12 23:00:55.099741] W [MSGID: 109073] [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting [2017-10-12 23:00:55.099752] I [MSGID: 109029] [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance [2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-video-client-4: changing port to 49164 (from 0) [2017-10-12 23:01:05.481180] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-10-12 23:01:05.482630] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected to video-client-4, attached to remote volume '/export/md2/brick'. [2017-10-12 23:01:05.482659] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and Client lk-version numbers are not same, reopening the fds [2017-10-12 23:01:05.483365] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server lk version = 1 [2017-10-12 23:01:30.310089] I [dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry [2017-10-12 23:01:30.310166] E [MSGID: 109111] [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: gf_defrag_process_dir failed for directory: /y/y/y/y/y [2017-10-12 23:01:30.380574] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /y/y/y/y/y [2017-10-12 23:01:30.380756] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /y/y/y/y [2017-10-12 23:01:30.380879] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /y/y/y [2017-10-12 23:01:30.380965] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /y/y [2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f112b6d16ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55b325019545] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b3250193b4] ) 0-: received signum (15), shutting down [2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv] 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) [2017-10-17 03:20:28.921554] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from video-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2017-10-17 03:20:28.921570] W [MSGID: 109073] [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting [2017-10-17 03:20:28.921578] I [MSGID: 109029] [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance [2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-video-client-4: changing port to 49164 (from 0) [2017-10-17 03:20:39.347440] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-10-17 03:20:39.349244] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected to video-client-4, attached to remote volume '/export/md2/brick'. [2017-10-17 03:20:39.349261] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and Client lk-version numbers are not same, reopening the fds [2017-10-17 03:20:39.350611] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server lk version = 1 [2017-10-17 03:27:17.231133] I [dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry [2017-10-17 03:27:17.231214] E [MSGID: 109111] [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: gf_defrag_process_dir failed for directory: /x/x/x/x/x [2017-10-17 03:27:17.562481] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /x/x/x/x/x [2017-10-17 03:27:17.562619] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /x/x/x/x [2017-10-17 03:27:17.562726] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /x/x/x [2017-10-17 03:27:17.562810] E [MSGID: 109016] [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for /x/x [2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f700b9696ba] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55f9c0022545] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55f9c00223b4] ) 0-: received signum (15), shutting down
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
