On 17 October 2017 at 16:39, Stephen Remde <[email protected]>
wrote:

> Nithya,
>
> Is there any way to increase the logging level of the brick? There is
> nothing obvious (to me) in the log (see below for the same time period as
> the latest rebalance failure). This is the only brick on that server that
> has disconnects like this.
>

You can use
gluster volume set <volname> brick-log-level DEBUG
or
 gluster volume set <volname> brick-log-level TRACE


@Milind and Raghavendra G , can you take a look at this to see why there
are so many disconnects?

Regards,
Nithya

>
> Steve
>
> [2017-10-17 02:22:13.453575] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-03-5825-2017/08/30-20:45:55:170091-video-client-4-2-318 
> (version: 3.8.15)
> [2017-10-17 02:22:31.353286] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403
> [2017-10-17 02:22:31.353326] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403
> [2017-10-17 02:22:42.288856] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404 
> (version: 3.8.13)
> [2017-10-17 02:29:04.889303] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404
> [2017-10-17 02:29:04.889347] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404
> [2017-10-17 02:29:15.327604] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405 
> (version: 3.8.13)
> [2017-10-17 02:33:30.745314] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594
> [2017-10-17 02:33:30.745360] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 02:33:30.745396] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594
> [2017-10-17 02:33:41.563748] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595 
> (version: 3.8.13)
> [2017-10-17 02:36:43.833304] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595
> [2017-10-17 02:36:43.833342] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 02:36:43.833371] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595
> [2017-10-17 02:36:54.569836] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596 
> (version: 3.8.13)
> [2017-10-17 02:38:16.697306] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596
> [2017-10-17 02:38:16.697370] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 02:38:16.697432] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596
> [2017-10-17 02:38:34.591506] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597 
> (version: 3.8.13)
> [2017-10-17 02:55:56.473306] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19
> [2017-10-17 02:55:56.473366] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19
> [2017-10-17 02:56:07.161790] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-20 (version: 
> 3.8.8)
> [2017-10-17 03:15:13.529281] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597
> [2017-10-17 03:15:13.529330] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 03:15:13.529400] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597
> [2017-10-17 03:15:41.764247] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598 
> (version: 3.8.13)
> [2017-10-17 03:20:28.921396] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0
> [2017-10-17 03:20:28.921498] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0
> [2017-10-17 03:20:39.348678] I [login.c:76:gf_auth] 0-auth/login: allowed 
> user names: be603ada-6523-44d3-a900-zzzzzzzzzzzz
> [2017-10-17 03:20:39.348909] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1 
> (version: 3.8.7)
> [2017-10-17 03:27:18.385374] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1
> [2017-10-17 03:27:18.385423] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1
> [2017-10-17 03:31:47.325285] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598
> [2017-10-17 03:31:47.325340] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 03:31:47.325384] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598
> [2017-10-17 03:32:00.855905] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599 
> (version: 3.8.13)
> [2017-10-17 03:33:23.001337] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599
> [2017-10-17 03:33:23.001400] I [MSGID: 115013] 
> [server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
> [2017-10-17 03:33:23.001450] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599
> [2017-10-17 03:33:33.860452] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-600 
> (version: 3.8.13)
> [2017-10-17 03:54:05.433317] I [MSGID: 115036] 
> [server.c:548:server_rpc_notify] 0-video-server: disconnecting connection 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405
> [2017-10-17 03:54:05.433353] I [MSGID: 101055] 
> [client_t.c:415:gf_client_unref] 0-video-server: Shutting down connection 
> node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405
> [2017-10-17 03:54:15.739343] I [MSGID: 115029] 
> [server-handshake.c:692:server_setvolume] 0-video-server: accepted client 
> from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-406 
> (version: 3.8.13)
>
>
> On 17 October 2017 at 10:26, Nithya Balachandran <[email protected]>
> wrote:
>
>>
>>
>> On 17 October 2017 at 14:48, Stephen Remde <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> I have a rebalance that has failed on one peer twice now. Rebalance logs 
>>> below (directories anonomised and some irrelevant log lines cut). It looks 
>>> like it loses connection to the brick, but immediately stops the rebalance 
>>> on that peer instead of waiting for reconnection - which happens a second 
>>> or so later.
>>> Is this normal behaviour? So far it has been the same server and the same 
>>> (remote) brick.
>>>
>>>
>>> The brick shows a high number of disconnects compared to the other bricks 
>>> on the same server
>>>
>>>
>>> ./export-md0-brick.log.1      2
>>> ./export-md1-brick.log.1      2
>>> ./export-md2-brick.log.1    181
>>> ./export-md3-brick.log.1      2
>>>
>>>
>>> Any clues? What could be causing this because there is nothing in the log 
>>> to indicate cause.
>>>
>>> The rebalance process requires that all DHT child subvols be up during
>> the operation as it needs to reapply the directory layouts (which requires
>> all child subvols to be up). As this is a pure distribute volume, even a
>> single brick getting disconnected is enough to cause the process to stop.
>>
>> You would need to figure out why that brick is disconnecting so often.
>> The brick logs might help with that.
>>
>> Regards,
>> Nithya
>>
>>
>>>
>>> Steve
>>>
>>>
>>> gluster volume info video
>>>
>>> Volume Name: video
>>> Type: Distribute
>>> Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 9
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.0.0.31:/export/md0/brick
>>> Brick2: 10.0.0.32:/export/md0/brick
>>> Brick3: 10.0.0.31:/export/md1/brick
>>> Brick4: 10.0.0.32:/export/md1/brick
>>> Brick5: 10.0.0.31:/export/md2/brick
>>> Brick6: 10.0.0.32:/export/md2/brick
>>> Brick7: 10.0.0.31:/export/md3/brick
>>> Brick8: 10.0.0.32:/export/md3/brick
>>> Brick9: 10.0.0.33:/export/md0/brick
>>> Options Reconfigured:
>>> network.ping-timeout: 10
>>> cluster.min-free-disk: 1%
>>> transport.address-family: inet
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> cluster.rebal-throttle: lazy
>>>
>>> [2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv] 
>>> 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer)
>>> [2017-10-12 23:00:55.099709] I [MSGID: 114018] 
>>> [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from 
>>> video-client-4. Client process will keep trying to connect to glusterd 
>>> until brick's port is available
>>> [2017-10-12 23:00:55.099741] W [MSGID: 109073] 
>>> [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting
>>> [2017-10-12 23:00:55.099752] I [MSGID: 109029] 
>>> [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance
>>> [2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 
>>> 0-video-client-4: changing port to 49164 (from 0)
>>> [2017-10-12 23:01:05.481180] I [MSGID: 114057] 
>>> [client-handshake.c:1446:select_server_supported_programs] 
>>> 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>> [2017-10-12 23:01:05.482630] I [MSGID: 114046] 
>>> [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected 
>>> to video-client-4, attached to remote volume '/export/md2/brick'.
>>> [2017-10-12 23:01:05.482659] I [MSGID: 114047] 
>>> [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and 
>>> Client lk-version numbers are not same, reopening the fds
>>> [2017-10-12 23:01:05.483365] I [MSGID: 114035] 
>>> [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server 
>>> lk version = 1
>>> [2017-10-12 23:01:30.310089] I [dht-rebalance.c:2819:gf_defrag_process_dir] 
>>> 0-DHT: Found critical error from gf_defrag_get_entry
>>> [2017-10-12 23:01:30.310166] E [MSGID: 109111] 
>>> [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: 
>>> gf_defrag_process_dir failed for directory: /y/y/y/y/y
>>> [2017-10-12 23:01:30.380574] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /y/y/y/y/y
>>> [2017-10-12 23:01:30.380756] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /y/y/y/y
>>> [2017-10-12 23:01:30.380879] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /y/y/y
>>> [2017-10-12 23:01:30.380965] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /y/y
>>> [2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and_exit] 
>>> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f112b6d16ba] 
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55b325019545] 
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b3250193b4] ) 0-: 
>>> received signum (15), shutting down
>>>
>>> [2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv] 
>>> 0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer)
>>> [2017-10-17 03:20:28.921554] I [MSGID: 114018] 
>>> [client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from 
>>> video-client-4. Client process will keep trying to connect to glusterd 
>>> until brick's port is available
>>> [2017-10-17 03:20:28.921570] W [MSGID: 109073] 
>>> [dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting
>>> [2017-10-17 03:20:28.921578] I [MSGID: 109029] 
>>> [dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance
>>> [2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 
>>> 0-video-client-4: changing port to 49164 (from 0)
>>> [2017-10-17 03:20:39.347440] I [MSGID: 114057] 
>>> [client-handshake.c:1446:select_server_supported_programs] 
>>> 0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>> [2017-10-17 03:20:39.349244] I [MSGID: 114046] 
>>> [client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected 
>>> to video-client-4, attached to remote volume '/export/md2/brick'.
>>> [2017-10-17 03:20:39.349261] I [MSGID: 114047] 
>>> [client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and 
>>> Client lk-version numbers are not same, reopening the fds
>>> [2017-10-17 03:20:39.350611] I [MSGID: 114035] 
>>> [client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server 
>>> lk version = 1
>>> [2017-10-17 03:27:17.231133] I [dht-rebalance.c:2819:gf_defrag_process_dir] 
>>> 0-DHT: Found critical error from gf_defrag_get_entry
>>> [2017-10-17 03:27:17.231214] E [MSGID: 109111] 
>>> [dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: 
>>> gf_defrag_process_dir failed for directory: /x/x/x/x/x
>>> [2017-10-17 03:27:17.562481] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /x/x/x/x/x
>>> [2017-10-17 03:27:17.562619] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /x/x/x/x
>>> [2017-10-17 03:27:17.562726] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /x/x/x
>>> [2017-10-17 03:27:17.562810] E [MSGID: 109016] 
>>> [dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed 
>>> for /x/x
>>> [2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and_exit] 
>>> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f700b9696ba] 
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55f9c0022545] 
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55f9c00223b4] ) 0-: 
>>> received signum (15), shutting down
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> [email protected]
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
>
> --
>
> Dr Stephen Remde
> Director, Innovation and Research
>
>
> T: 01535 280066
> M: 07764 740920
> E: [email protected]
> W: www.gaist.co.uk
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to