On 17 October 2017 at 16:39, Stephen Remde <[email protected]> wrote:
> Nithya, > > Is there any way to increase the logging level of the brick? There is > nothing obvious (to me) in the log (see below for the same time period as > the latest rebalance failure). This is the only brick on that server that > has disconnects like this. > You can use gluster volume set <volname> brick-log-level DEBUG or gluster volume set <volname> brick-log-level TRACE @Milind and Raghavendra G , can you take a look at this to see why there are so many disconnects? Regards, Nithya > > Steve > > [2017-10-17 02:22:13.453575] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-03-5825-2017/08/30-20:45:55:170091-video-client-4-2-318 > (version: 3.8.15) > [2017-10-17 02:22:31.353286] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403 > [2017-10-17 02:22:31.353326] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403 > [2017-10-17 02:22:42.288856] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 > (version: 3.8.13) > [2017-10-17 02:29:04.889303] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 > [2017-10-17 02:29:04.889347] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 > [2017-10-17 02:29:15.327604] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 > (version: 3.8.13) > [2017-10-17 02:33:30.745314] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594 > [2017-10-17 02:33:30.745360] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 02:33:30.745396] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594 > [2017-10-17 02:33:41.563748] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 > (version: 3.8.13) > [2017-10-17 02:36:43.833304] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 > [2017-10-17 02:36:43.833342] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 02:36:43.833371] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 > [2017-10-17 02:36:54.569836] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 > (version: 3.8.13) > [2017-10-17 02:38:16.697306] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 > [2017-10-17 02:38:16.697370] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 02:38:16.697432] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 > [2017-10-17 02:38:34.591506] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 > (version: 3.8.13) > [2017-10-17 02:55:56.473306] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19 > [2017-10-17 02:55:56.473366] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19 > [2017-10-17 02:56:07.161790] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-20 (version: > 3.8.8) > [2017-10-17 03:15:13.529281] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 > [2017-10-17 03:15:13.529330] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 03:15:13.529400] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 > [2017-10-17 03:15:41.764247] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 > (version: 3.8.13) > [2017-10-17 03:20:28.921396] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0 > [2017-10-17 03:20:28.921498] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0 > [2017-10-17 03:20:39.348678] I [login.c:76:gf_auth] 0-auth/login: allowed > user names: be603ada-6523-44d3-a900-zzzzzzzzzzzz > [2017-10-17 03:20:39.348909] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 > (version: 3.8.7) > [2017-10-17 03:27:18.385374] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 > [2017-10-17 03:27:18.385423] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 > [2017-10-17 03:31:47.325285] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 > [2017-10-17 03:31:47.325340] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 03:31:47.325384] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 > [2017-10-17 03:32:00.855905] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 > (version: 3.8.13) > [2017-10-17 03:33:23.001337] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 > [2017-10-17 03:33:23.001400] I [MSGID: 115013] > [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx > [2017-10-17 03:33:23.001450] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 > [2017-10-17 03:33:33.860452] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-600 > (version: 3.8.13) > [2017-10-17 03:54:05.433317] I [MSGID: 115036] > [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 > [2017-10-17 03:54:05.433353] I [MSGID: 101055] > [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection > node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 > [2017-10-17 03:54:15.739343] I [MSGID: 115029] > [server-handshake.c:692:server_setvolume] 0-video-server: accepted client > from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-406 > (version: 3.8.13) > > > On 17 October 2017 at 10:26, Nithya Balachandran <[email protected]> > wrote: > >> >> >> On 17 October 2017 at 14:48, Stephen Remde <[email protected]> >> wrote: >> >>> Hi, >>> >>> >>> I have a rebalance that has failed on one peer twice now. Rebalance logs >>> below (directories anonomised and some irrelevant log lines cut). It looks >>> like it loses connection to the brick, but immediately stops the rebalance >>> on that peer instead of waiting for reconnection - which happens a second >>> or so later. >>> Is this normal behaviour? So far it has been the same server and the same >>> (remote) brick. >>> >>> >>> The brick shows a high number of disconnects compared to the other bricks >>> on the same server >>> >>> >>> ./export-md0-brick.log.1 2 >>> ./export-md1-brick.log.1 2 >>> ./export-md2-brick.log.1 181 >>> ./export-md3-brick.log.1 2 >>> >>> >>> Any clues? What could be causing this because there is nothing in the log >>> to indicate cause. >>> >>> The rebalance process requires that all DHT child subvols be up during >> the operation as it needs to reapply the directory layouts (which requires >> all child subvols to be up). As this is a pure distribute volume, even a >> single brick getting disconnected is enough to cause the process to stop. >> >> You would need to figure out why that brick is disconnecting so often. >> The brick logs might help with that. >> >> Regards, >> Nithya >> >> >>> >>> Steve >>> >>> >>> gluster volume info video >>> >>> Volume Name: video >>> Type: Distribute >>> Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.0.0.31:/export/md0/brick >>> Brick2: 10.0.0.32:/export/md0/brick >>> Brick3: 10.0.0.31:/export/md1/brick >>> Brick4: 10.0.0.32:/export/md1/brick >>> Brick5: 10.0.0.31:/export/md2/brick >>> Brick6: 10.0.0.32:/export/md2/brick >>> Brick7: 10.0.0.31:/export/md3/brick >>> Brick8: 10.0.0.32:/export/md3/brick >>> Brick9: 10.0.0.33:/export/md0/brick >>> Options Reconfigured: >>> network.ping-timeout: 10 >>> cluster.min-free-disk: 1% >>> transport.address-family: inet >>> performance.readdir-ahead: on >>> nfs.disable: on >>> cluster.rebal-throttle: lazy >>> >>> [2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv] >>> 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) >>> [2017-10-12 23:00:55.099709] I [MSGID: 114018] >>> [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from >>> video-client-4. Client process will keep trying to connect to glusterd >>> until brick's port is available >>> [2017-10-12 23:00:55.099741] W [MSGID: 109073] >>> [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting >>> [2017-10-12 23:00:55.099752] I [MSGID: 109029] >>> [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance >>> [2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reconfig] >>> 0-video-client-4: changing port to 49164 (from 0) >>> [2017-10-12 23:01:05.481180] I [MSGID: 114057] >>> [client-handshake.c:1446:select_server_supported_programs] >>> 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) >>> [2017-10-12 23:01:05.482630] I [MSGID: 114046] >>> [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected >>> to video-client-4, attached to remote volume '/export/md2/brick'. >>> [2017-10-12 23:01:05.482659] I [MSGID: 114047] >>> [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and >>> Client lk-version numbers are not same, reopening the fds >>> [2017-10-12 23:01:05.483365] I [MSGID: 114035] >>> [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server >>> lk version = 1 >>> [2017-10-12 23:01:30.310089] I [dht-rebalance.c:2819:gf_defrag_process_dir] >>> 0-DHT: Found critical error from gf_defrag_get_entry >>> [2017-10-12 23:01:30.310166] E [MSGID: 109111] >>> [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: >>> gf_defrag_process_dir failed for directory: /y/y/y/y/y >>> [2017-10-12 23:01:30.380574] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /y/y/y/y/y >>> [2017-10-12 23:01:30.380756] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /y/y/y/y >>> [2017-10-12 23:01:30.380879] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /y/y/y >>> [2017-10-12 23:01:30.380965] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /y/y >>> [2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and_exit] >>> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f112b6d16ba] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55b325019545] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b3250193b4] ) 0-: >>> received signum (15), shutting down >>> >>> [2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv] >>> 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer) >>> [2017-10-17 03:20:28.921554] I [MSGID: 114018] >>> [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from >>> video-client-4. Client process will keep trying to connect to glusterd >>> until brick's port is available >>> [2017-10-17 03:20:28.921570] W [MSGID: 109073] >>> [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting >>> [2017-10-17 03:20:28.921578] I [MSGID: 109029] >>> [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance >>> [2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reconfig] >>> 0-video-client-4: changing port to 49164 (from 0) >>> [2017-10-17 03:20:39.347440] I [MSGID: 114057] >>> [client-handshake.c:1446:select_server_supported_programs] >>> 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) >>> [2017-10-17 03:20:39.349244] I [MSGID: 114046] >>> [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected >>> to video-client-4, attached to remote volume '/export/md2/brick'. >>> [2017-10-17 03:20:39.349261] I [MSGID: 114047] >>> [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and >>> Client lk-version numbers are not same, reopening the fds >>> [2017-10-17 03:20:39.350611] I [MSGID: 114035] >>> [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server >>> lk version = 1 >>> [2017-10-17 03:27:17.231133] I [dht-rebalance.c:2819:gf_defrag_process_dir] >>> 0-DHT: Found critical error from gf_defrag_get_entry >>> [2017-10-17 03:27:17.231214] E [MSGID: 109111] >>> [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: >>> gf_defrag_process_dir failed for directory: /x/x/x/x/x >>> [2017-10-17 03:27:17.562481] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /x/x/x/x/x >>> [2017-10-17 03:27:17.562619] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /x/x/x/x >>> [2017-10-17 03:27:17.562726] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /x/x/x >>> [2017-10-17 03:27:17.562810] E [MSGID: 109016] >>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed >>> for /x/x >>> [2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and_exit] >>> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f700b9696ba] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55f9c0022545] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55f9c00223b4] ) 0-: >>> received signum (15), shutting down >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> > > > -- > > Dr Stephen Remde > Director, Innovation and Research > > > T: 01535 280066 > M: 07764 740920 > E: [email protected] > W: www.gaist.co.uk >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
