Okay so it's fixed by killing Gluster and rebooting the node again.
-- Respectfully Mahdi A. Mahdi ________________________________ From: [email protected] <[email protected]> on behalf of Mahdi Adnan <[email protected]> Sent: Wednesday, May 3, 2017 10:15:45 AM To: [email protected] Subject: [Gluster-users] Gluster long healing process Hi, I have a 4 node Gluster volume, each has 24 SSD brick running Gluster 3.8.10 (two volumes), i updated one of the nodes to 3.8.11 and rebooted the node, after it came back online the healing process started and it never ended. It has been 24 hours and the healing is still going, gluster vol heal $VOL info return number of entries that need healing and it decrees and increase randomly. The node is writing lots of Gigabytes and i dont know if this is normal or something im missing. Volume details; Volume Name: ovirt_imgs Type: Distributed-Replicate Volume ID: 40d1354b-8e85-4464-8c71-9e2efbe10a63 Status: Started Snapshot Count: 0 Number of Bricks: 26 x 2 = 52 Transport-type: tcp Bricks: Brick1: gluster01:/mnt/ovirt_disk1/ovirt_imgs Brick2: gluster03:/mnt/ovirt_disk1/ovirt_imgs Brick3: gluster02:/mnt/ovirt_disk1/ovirt_imgs Brick4: gluster04:/mnt/ovirt_disk1/ovirt_imgs Brick5: gluster01:/mnt/ovirt_disk2/ovirt_imgs Brick6: gluster03:/mnt/ovirt_disk2/ovirt_imgs Brick7: gluster02:/mnt/ovirt_disk2/ovirt_imgs Brick8: gluster04:/mnt/ovirt_disk2/ovirt_imgs Brick9: gluster01:/mnt/ovirt_disk3/ovirt_imgs Brick10: gluster03:/mnt/ovirt_disk3/ovirt_imgs Brick11: gluster02:/mnt/ovirt_disk3/ovirt_imgs Brick12: gluster04:/mnt/ovirt_disk3/ovirt_imgs Brick13: gluster01:/mnt/ovirt_disk4/ovirt_imgs Brick14: gluster03:/mnt/ovirt_disk4/ovirt_imgs Brick15: gluster02:/mnt/ovirt_disk4/ovirt_imgs Brick16: gluster04:/mnt/ovirt_disk4/ovirt_imgs Brick17: gluster01:/mnt/ovirt_disk5/ovirt_imgs Brick18: gluster03:/mnt/ovirt_disk5/ovirt_imgs Brick19: gluster02:/mnt/ovirt_disk5/ovirt_imgs Brick20: gluster04:/mnt/ovirt_disk5/ovirt_imgs Brick21: gluster01:/mnt/ovirt_disk6/ovirt_imgs Brick22: gluster03:/mnt/ovirt_disk6/ovirt_imgs Brick23: gluster02:/mnt/ovirt_disk6/ovirt_imgs Brick24: gluster04:/mnt/ovirt_disk6/ovirt_imgs Brick25: gluster01:/mnt/ovirt_disk7/ovirt_imgs Brick26: gluster03:/mnt/ovirt_disk7/ovirt_imgs Brick27: gluster02:/mnt/ovirt_disk7/ovirt_imgs Brick28: gluster04:/mnt/ovirt_disk7/ovirt_imgs Brick29: gluster01:/mnt/ovirt_disk8/ovirt_imgs Brick30: gluster03:/mnt/ovirt_disk8/ovirt_imgs Brick31: gluster02:/mnt/ovirt_disk8/ovirt_imgs Brick32: gluster04:/mnt/ovirt_disk8/ovirt_imgs Brick33: gluster01:/mnt/ovirt_disk9/ovirt_imgs Brick34: gluster03:/mnt/ovirt_disk9/ovirt_imgs Brick35: gluster02:/mnt/ovirt_disk9/ovirt_imgs Brick36: gluster04:/mnt/ovirt_disk9/ovirt_imgs Brick37: gluster01:/mnt/ovirt_disk10/ovirt_imgs Brick38: gluster03:/mnt/ovirt_disk10/ovirt_imgs Brick39: gluster02:/mnt/ovirt_disk10/ovirt_imgs Brick40: gluster04:/mnt/ovirt_disk10/ovirt_imgs Brick41: gluster01:/mnt/ovirt_disk11/ovirt_imgs Brick42: gluster03:/mnt/ovirt_disk11/ovirt_imgs Brick43: gluster02:/mnt/ovirt_disk11/ovirt_imgs Brick44: gluster04:/mnt/ovirt_disk11/ovirt_imgs Brick45: gluster01:/mnt/ovirt_disk12/ovirt_imgs Brick46: gluster03:/mnt/ovirt_disk12/ovirt_imgs Brick47: gluster02:/mnt/ovirt_disk12/ovirt_imgs Brick48: gluster04:/mnt/ovirt_disk12/ovirt_imgs Brick49: gluster01:/mnt/ovirt_disk13/ovirt_imgs Brick50: gluster03:/mnt/ovirt_disk13/ovirt_imgs Brick51: gluster02:/mnt/ovirt_disk13/ovirt_imgs Brick52: gluster04:/mnt/ovirt_disk13/ovirt_imgs Options Reconfigured: ganesha.enable: off features.cache-invalidation: off features.shard-block-size: 256MB storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: none cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on cluster.server-quorum-ratio: 51% nfs-ganesha: enable cluster.enable-shared-storage: enable OS: Centos 7.3 latest. gluster heal log sample; [2017-05-03 07:01:29.487108] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-45: changing port to 49571 (from 0) [2017-05-03 07:01:29.489004] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-47: parent translators are ready, attempting connect on transport [2017-05-03 07:01:29.491077] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-44: Connected to ovirt_imgs-client-44, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'. [2017-05-03 07:01:29.491092] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-44: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.491123] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-22: Subvolume 'ovirt_imgs-client-44' came back up; going online. [2017-05-03 07:01:29.491173] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-44: Server lk version = 1 [2017-05-03 07:01:29.491280] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-45: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.491331] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-46: changing port to 49521 (from 0) [2017-05-03 07:01:29.493119] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-48: parent translators are ready, attempting connect on transport [2017-05-03 07:01:29.495480] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-45: Connected to ovirt_imgs-client-45, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'. [2017-05-03 07:01:29.495496] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-45: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.495670] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-46: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.495729] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-45: Server lk version = 1 [2017-05-03 07:01:29.495798] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-47: changing port to 49465 (from 0) [2017-05-03 07:01:29.497438] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-49: parent translators are ready, attempting connect on transport [2017-05-03 07:01:29.499871] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-46: Connected to ovirt_imgs-client-46, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'. [2017-05-03 07:01:29.499887] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-46: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.499915] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-23: Subvolume 'ovirt_imgs-client-46' came back up; going online. [2017-05-03 07:01:29.500015] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-46: Server lk version = 1 [2017-05-03 07:01:29.500032] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-48: changing port to 49645 (from 0) [2017-05-03 07:01:29.500052] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-47: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.501776] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-50: parent translators are ready, attempting connect on transport [2017-05-03 07:01:29.504191] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-47: Connected to ovirt_imgs-client-47, attached to remote volume '/mnt/ovirt_disk12/ovirt_imgs'. [2017-05-03 07:01:29.504208] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-47: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.504313] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-47: Server lk version = 1 [2017-05-03 07:01:29.504330] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-48: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.504462] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-49: changing port to 49572 (from 0) [2017-05-03 07:01:29.506374] I [MSGID: 114020] [client.c:2356:notify] 0-ovirt_imgs-client-51: parent translators are ready, attempting connect on transport [2017-05-03 07:01:29.508431] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-48: Connected to ovirt_imgs-client-48, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'. [2017-05-03 07:01:29.508456] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-48: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.508498] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-24: Subvolume 'ovirt_imgs-client-48' came back up; going online. [2017-05-03 07:01:29.508556] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-48: Server lk version = 1 [2017-05-03 07:01:29.508603] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-49: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.508725] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-50: changing port to 49522 (from 0) [2017-05-03 07:01:29.510779] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-49: Connected to ovirt_imgs-client-49, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'. [2017-05-03 07:01:29.510796] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-49: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.510903] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-49: Server lk version = 1 [2017-05-03 07:01:29.511062] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-ovirt_imgs-client-51: changing port to 49466 (from 0) [2017-05-03 07:01:29.512828] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-50: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.513197] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-50: Connected to ovirt_imgs-client-50, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'. [2017-05-03 07:01:29.513214] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-50: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.513236] I [MSGID: 108005] [afr-common.c:4387:afr_notify] 0-ovirt_imgs-replicate-25: Subvolume 'ovirt_imgs-client-50' came back up; going online. [2017-05-03 07:01:29.513314] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-50: Server lk version = 1 [2017-05-03 07:01:29.515127] I [MSGID: 114057] [client-handshake.c:1440:select_server_supported_programs] 0-ovirt_imgs-client-51: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-05-03 07:01:29.515520] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-51: Connected to ovirt_imgs-client-51, attached to remote volume '/mnt/ovirt_disk13/ovirt_imgs'. [2017-05-03 07:01:29.515530] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-51: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:29.515628] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-51: Server lk version = 1 [2017-05-03 07:01:30.009624] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-40: Connected to ovirt_imgs-client-40, attached to remote volume '/mnt/ovirt_disk11/ovirt_imgs'. [2017-05-03 07:01:30.009653] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-40: Server and Client lk-version numbers are not same, reopening the fds [2017-05-03 07:01:30.234722] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-40: Server lk version = 1 [2017-05-03 07:01:30.235633] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-0: selecting local read_child ovirt_imgs-client-0 [2017-05-03 07:01:30.236983] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-2: selecting local read_child ovirt_imgs-client-4 [2017-05-03 07:01:30.237492] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-4: selecting local read_child ovirt_imgs-client-8 [2017-05-03 07:01:30.238310] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-6: selecting local read_child ovirt_imgs-client-12 [2017-05-03 07:01:30.238553] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-8: selecting local read_child ovirt_imgs-client-16 [2017-05-03 07:01:30.238670] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-10: selecting local read_child ovirt_imgs-client-20 [2017-05-03 07:01:30.238791] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-12: selecting local read_child ovirt_imgs-client-24 [2017-05-03 07:01:30.238881] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-14: selecting local read_child ovirt_imgs-client-28 [2017-05-03 07:01:30.238961] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-16: selecting local read_child ovirt_imgs-client-32 [2017-05-03 07:01:30.239014] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-18: selecting local read_child ovirt_imgs-client-36 [2017-05-03 07:01:30.239100] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-22: selecting local read_child ovirt_imgs-client-44 [2017-05-03 07:01:30.239140] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-20: selecting local read_child ovirt_imgs-client-40 [2017-05-03 07:01:30.239150] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-ovirt_imgs: switched to graph 676c7573-7465-7230-312d-31333836322d (0) [2017-05-03 07:01:30.239200] I [MSGID: 108031] [afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-24: selecting local read_child ovirt_imgs-client-48 i appreciate the help. Thanks -- Respectfully Mahdi A. Mahdi
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
