Another small update from me, I have been keeping an eye on the glustershd.log file to see what is going on and I keep seeing the same file names come up in there every 10 minutes, but not a lot of other activity. Logs below. How can I be sure my heal is progressing through the files which actually need to be healed? I thought it would show up in these logs. I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed things up too.
Any ideas here? Thanks, - Patrick On 01-B ------- [2019-04-21 09:12:54.575689] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904 [2019-04-21 09:12:54.733601] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2 sinks=1 [2019-04-21 09:13:12.028509] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:13:12.047470] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:23:13.044377] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:23:13.051479] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:07.400369] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 [2019-04-21 09:33:11.825449] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa [2019-04-21 09:33:14.029837] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:33:14.037436] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:23.913882] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 [2019-04-21 09:33:43.874201] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1 [2019-04-21 09:34:02.273898] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2 sinks=1 [2019-04-21 09:35:12.282045] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 [2019-04-21 09:35:15.146252] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885 [2019-04-21 09:35:15.254538] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 [2019-04-21 09:35:22.900803] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 [2019-04-21 09:35:27.150963] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45 [2019-04-21 09:35:29.186295] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 [2019-04-21 09:35:35.967451] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 [2019-04-21 09:35:40.733444] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9 [2019-04-21 09:35:58.707593] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 [2019-04-21 09:36:25.554260] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 [2019-04-21 09:36:26.031422] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d [2019-04-21 09:36:26.083982] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 On 02-B ------- [2019-04-21 09:03:15.815250] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:03:15.863153] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:03:15.867432] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:03:15.875134] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:03:39.020198] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:03:39.027345] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:13:18.524874] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:13:20.070172] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:13:20.074977] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:13:20.080827] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:13:40.015763] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:13:40.021805] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:23:21.991032] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:23:22.054565] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:23:22.059225] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:23:22.066266] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:23:41.129962] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:23:41.135919] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:24.015223] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:33:24.069686] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:33:24.074341] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:33:24.080065] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:33:42.099515] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:33:42.107481] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie <[email protected]> wrote: > Just another small update, I'm continuing to watch my brick logs and I > just saw these errors come up in the recent events too. I am going to > continue to post any errors I see in the hope of finding the right one to > try and fix.. > This is from the logs on brick1, seems to be occurring on both nodes on > brick1, although at different times. I'm not sure what this means, can > anyone shed any light? > I guess I am looking for some kind of specific error which may indicate > something is broken or stuck and locking up and causing the extreme latency > I'm seeing in the cluster. > > [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) > [0x7f3b3e93158a] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) > [0x7f3b3e4c5d45] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > > Thanks again, > > -Patrick > > On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <[email protected]> > wrote: > >> Hi Darrell, >> >> Thanks again for your advice, I've left it for a while but unfortunately >> it's still just as slow and causing more problems for our operations now. I >> will need to try and take some steps to at least bring performance back to >> normal while continuing to investigate the issue longer term. I can >> definitely see one node with heavier CPU than the other, almost double, >> which I am OK with, but I think the heal process is going to take forever, >> trying to check the "gluster volume heal info" shows thousands and >> thousands of files which may need healing, I have no idea how many in total >> the command is still running after hours, so I am not sure what has gone so >> wrong to cause this. >> >> I've checked cluster.op-version and cluster.max-op-version and it looks >> like I'm on the latest version there. >> >> I have no idea how long the healing is going to take on this cluster, we >> have around 560TB of data on here, but I don't think I can wait that long >> to try and restore performance to normal. >> >> Can anyone think of anything else I can try in the meantime to work out >> what's causing the extreme latency? >> >> I've been going through cluster client the logs of some of our VMs and on >> some of our FTP servers I found this in the cluster mount log, but I am not >> seeing it on any of our other servers, just our FTP servers. >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >> operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >> operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws >> here. >> >> I am going to hold off on the version upgrade until I know there are no >> files which need healing, which could be a while, from some reading I've >> done there shouldn't be any issues with this as both are on v3.12.x >> >> I've free'd up a small amount of space, but I still need to work on this >> further. >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" >> which could be run on each brick and it would potentially clean up any >> files which were deleted straight from the bricks, but not via the client, >> I have a feeling this could help me free up about 5-10TB per brick from >> what I've been told about the history of this cluster. Can anyone confirm >> if this is actually safe to run? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks >> again for any advice. >> >> Cheers, >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <[email protected]> >> wrote: >> >>> Patrick, >>> >>> Sounds like progress. Be aware that gluster is expected to max out the >>> CPUs on at least one of your servers while healing. This is normal and >>> won’t adversely affect overall performance (any more than having bricks in >>> need of healing, at any rate) unless you’re overdoing it. shd threads <= 4 >>> should not do that on your hardware. Other tunings may have also increased >>> overall performance, so you may see higher CPU than previously anyway. I’d >>> recommend upping those thread counts and letting it heal as fast as >>> possible, especially if these are dedicated Gluster storage servers (Ie: >>> not also running VMs, etc). You should see “normal” CPU use one heals are >>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >>> cores). It’s also likely to be different between your servers, in a pure >>> replica, one tends to max and one tends to be a little higher, in a >>> distributed-replica, I’d expect more than one to run harder while healing. >>> >>> Keep the differences between doing an ls on a brick and doing an ls on a >>> gluster mount in mind. When you do a ls on a gluster volume, it isn’t just >>> doing a ls on one brick, it’s effectively doing it on ALL of your bricks, >>> and they all have to return data before the ls succeeds. In a distributed >>> volume, it’s figuring out where on each volume things live and getting the >>> stat() from each to assemble the whole thing. And if things are in need of >>> healing, it will take even longer to decide which version is current and >>> use it (shd triggers a heal anytime it encounters this). Any of these >>> things being slow slows down the overall response. >>> >>> At this point, I’d get some sleep too, and let your cluster heal while >>> you do. I’d really want it fully healed before I did any updates anyway, so >>> let it use CPU and get itself sorted out. Expect it to do a round of >>> healing after you upgrade each machine too, this is normal so don’t let the >>> CPU spike surprise you, It’s just catching up from the downtime incurred by >>> the update and/or reboot if you did one. >>> >>> That reminds me, check your gluster cluster.op-version and >>> cluster.max-op-version (gluster vol get all all | grep op-version). If >>> op-version isn’t at the max-op-verison, set it to it so you’re taking >>> advantage of the latest features available to your version. >>> >>> -Darrell >>> >>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <[email protected]> >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've applied the acltype=posixacl on my >>> zpools and I think that has reduced some of the noise from my brick logs. >>> I also bumped up some of the thread counts you suggested but my CPU load >>> skyrocketed, so I dropped it back down to something slightly lower, but >>> still higher than it was before, and will see how that goes for a while. >>> >>> Although low space is a definite issue, if I run an ls anywhere on my >>> bricks directly it's instant, <1 second, and still takes several minutes >>> via gluster, so there is still a problem in my gluster configuration >>> somewhere. We don't have any snapshots, but I am trying to work out if any >>> data on there is safe to delete, or if there is any way I can safely find >>> and delete data which has been removed directly from the bricks in the >>> past. I also have lz4 compression already enabled on each zpool which does >>> help a bit, we get between 1.05 and 1.08x compression on this data. >>> I've tried to go through each client and checked it's cluster mount logs >>> and also my brick logs and looking for errors, so far nothing is jumping >>> out at me, but there are some warnings and errors here and there, I am >>> trying to work out what they mean. >>> >>> It's already 1 am here and unfortunately, I'm still awake working on >>> this issue, but I think that I will have to leave the version upgrades >>> until tomorrow. >>> >>> Thanks again for your advice so far. If anyone has any ideas on where I >>> can look for errors other than brick logs or the cluster mount logs to help >>> resolve this issue, it would be much appreciated. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <[email protected]> >>> wrote: >>> >>>> See inline: >>>> >>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <[email protected]> >>>> wrote: >>>> >>>> Hi Darrell, >>>> >>>> Thanks for your reply, this issue seems to be getting worse over the >>>> last few days, really has me tearing my hair out. I will do as you have >>>> suggested and get started on upgrading from 3.12.14 to 3.12.15. >>>> I've checked the zfs properties and all bricks have "xattr=sa" set, but >>>> none of them has "acltype=posixacl" set, currently the acltype property >>>> shows "off", if I make these changes will it apply retroactively to the >>>> existing data? I'm unfamiliar with what this will change so I may need to >>>> look into that before I proceed. >>>> >>>> >>>> It is safe to apply that now, any new set/get calls will then use it if >>>> new posixacls exist, and use older if not. ZFS is good that way. It should >>>> clear up your posix_acl and posix errors over time. >>>> >>>> I understand performance is going to slow down as the bricks get full, >>>> I am currently trying to free space and migrate data to some newer storage, >>>> I have fresh several hundred TB storage I just setup recently but with >>>> these performance issues it's really slow. I also believe there is >>>> significant data which has been deleted directly from the bricks in the >>>> past, so if I can reclaim this space in a safe manner then I will have at >>>> least around 10-15% free space. >>>> >>>> >>>> Full ZFS volumes will have a much larger impact on performance than >>>> you’d think, I’d prioritize this. If you have been taking zfs snapshots, >>>> consider deleting them to get the overall volume free space back up. And >>>> just to be sure it’s been said, delete from within the mounted volumes, >>>> don’t delete directly from the bricks (gluster will just try and heal it >>>> later, compounding your issues). Does not apply to deleting other data from >>>> the ZFS volume if it’s not part of the brick directory, of course. >>>> >>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >>>> generally they have plenty of resources available, currently only using >>>> around 330/512GB of memory. >>>> >>>> I will look into what your suggested settings will change, and then >>>> will probably go ahead with your recommendations, for our specs as stated >>>> above, what would you suggest for performance.io-thread-count ? >>>> >>>> >>>> I run single 2630v4s on my servers, which have a smaller storage >>>> footprint than yours. I’d go with 32 for performance.io-thread-count. >>>> I’d try 4 for the shd thread settings on that gear. Your memory use sounds >>>> fine, so no worries there. >>>> >>>> Our workload is nothing too extreme, we have a few VMs which write >>>> backup data to this storage nightly for our clients, our VMs don't live on >>>> this cluster, but just write to it. >>>> >>>> >>>> If they are writing compressible data, you’ll get immediate benefit by >>>> setting compression=lz4 on your ZFS volumes. It won’t help any old data, of >>>> course, but it will compress new data going forward. This is another one >>>> that’s safe to enable on the fly. >>>> >>>> I've been going through all of the logs I can, below are some slightly >>>> sanitized errors I've come across, but I'm not sure what to make of them. >>>> The main error I am seeing is the first one below, across several of my >>>> bricks, but possibly only for specific folders on the cluster, I'm not 100% >>>> about that yet though. >>>> >>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>> with 'user_xattr' flag) >>>> >>>> >>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] >>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>> Backup_clone1.vbm_62906_tmp), client: >>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>> gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>> Backup_clone1.vbm_62906_tmp), client: >>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>> gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] >>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>>> >>>> >>>> posixacls should clear those up, as mentioned. >>>> >>>> >>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >>>> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >>>> by 980fdbbd367f0000 on 0x7fc4f0161440 >>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >>>> client: >>>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >>>> error-xlator: gvAA01-locks [Invalid argument] >>>> >>>> >>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >>>> [0x7ff4ae6f796a] >>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >>>> [0x7ff4ae2a96e8] >>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >>>> [0x7ff4ae28528d] ) 0-: Reply submission failed >>>> >>>> >>>> Fix the posix acls and see if these clear up over time as well, I’m >>>> unclear on what the overall effect of running without the posix acls will >>>> be to total gluster health. Your biggest problem sounds like you need to >>>> free up space on the volumes and get the overall volume health back up to >>>> par and see if that doesn’t resolve the symptoms you’re seeing. >>>> >>>> >>>> >>>> Thank you again for your assistance. It is greatly appreciated. >>>> >>>> - Patrick >>>> >>>> >>>> >>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <[email protected]> >>>> wrote: >>>> >>>>> Patrick, >>>>> >>>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >>>>> also mention ZFS, and that error you show makes me think you need to check >>>>> to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS >>>>> volumes. >>>>> >>>>> You also observed your bricks are crossing the 95% full line, ZFS >>>>> performance will degrade significantly the closer you get to full. In my >>>>> experience, this starts somewhere between 10% and 5% free space remaining, >>>>> so you’re in that realm. >>>>> >>>>> How’s your free memory on the servers doing? Do you have your zfs arc >>>>> cache limited to something less than all the RAM? It shares pretty well, >>>>> but I’ve encountered situations where other things won’t try and take ram >>>>> back properly if they think it’s in use, so ZFS never gets the opportunity >>>>> to give it up. >>>>> >>>>> Since your volume is a disperse-replica, you might try tuning >>>>> disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more >>>>> if >>>>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>>>> client.event-threads to 8 has proven helpful in many cases. After you get >>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. >>>>> I >>>>> don’t know if it matters, but I’d also recommend resetting >>>>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>>>> also setting performance.io-thread-count to 32 if those have beefy >>>>> CPUs. >>>>> >>>>> Beyond those general ideas, more info about your hardware (CPU and >>>>> RAM) and workload (VMs, direct storage for web servers or enders, etc) may >>>>> net you some more ideas. Then you’re going to have to do more digging into >>>>> brick logs looking for errors and/or warnings to see what’s going on. >>>>> >>>>> -Darrell >>>>> >>>>> >>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie <[email protected]> >>>>> wrote: >>>>> >>>>> Hello Gluster Users, >>>>> >>>>> I am hoping someone can help me with resolving an ongoing issue I've >>>>> been having, I'm new to mailing lists so forgive me if I have gotten >>>>> anything wrong. We have noticed our performance deteriorating over the >>>>> last >>>>> few weeks, easily measured by trying to do an ls on one of our top-level >>>>> folders, and timing it, which usually would take 2-5 seconds, and now >>>>> takes >>>>> up to 20 minutes, which obviously renders our cluster basically unusable. >>>>> This has been intermittent in the past but is now almost constant and I am >>>>> not sure how to work out the exact cause. We have noticed some errors in >>>>> the brick logs, and have noticed that if we kill the right brick process, >>>>> performance instantly returns back to normal, this is not always the same >>>>> brick, but it indicates to me something in the brick processes or >>>>> background tasks may be causing extreme latency. Due to this ability to >>>>> fix >>>>> it by killing the right brick process off, I think it's a specific file, >>>>> or >>>>> folder, or operation which may be hanging and causing the increased >>>>> latency, but I am not sure how to work it out. One last thing to add is >>>>> that our bricks are getting quite full (~95% full), we are trying to >>>>> migrate data off to new storage but that is going slowly, not helped by >>>>> this issue. I am currently trying to run a full heal as there appear to be >>>>> many files needing healing, and I have all brick processes running so they >>>>> have an opportunity to heal, but this means performance is very poor. It >>>>> currently takes over 15-20 minutes to do an ls of one of our top-level >>>>> folders, which just contains 60-80 other folders, this should take 2-5 >>>>> seconds. This is all being checked by FUSE mount locally on the storage >>>>> node itself, but it is the same for other clients and VMs accessing the >>>>> cluster. Initially, it seemed our NFS mounts were not affected and >>>>> operated >>>>> at normal speed, but testing over the last day has shown that our NFS >>>>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>>>> first thought it might be. >>>>> >>>>> I am not sure how to proceed from here, I am fairly new to gluster >>>>> having inherited this setup from my predecessor and trying to keep it >>>>> going. I have included some info below to try and help with diagnosis, >>>>> please let me know if any further info would be helpful. I would really >>>>> appreciate any advice on what I could try to work out the cause. Thank you >>>>> in advance for reading this, and any suggestions you might be able to >>>>> offer. >>>>> >>>>> - Patrick >>>>> >>>>> This is an example of the main error I see in our brick logs, there >>>>> have been others, I can post them when I see them again too: >>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick1/<filename> library: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>>> with 'user_xattr' flag) >>>>> >>>>> Our setup consists of 2 storage nodes and an arbiter node. I have >>>>> noticed our nodes are on slightly different versions, I'm not sure if this >>>>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>>>> pools - total capacity is around 560TB. >>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>>>> with iperf and found that it's what would be expected from this config. >>>>> Individual brick performance seems ok, I've tested several bricks >>>>> using dd and can write a 10GB files at 1.7GB/s. >>>>> >>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>>> 10000+0 records in >>>>> 10000+0 records out >>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>>> >>>>> Node 1: >>>>> # glusterfs --version >>>>> glusterfs 3.12.15 >>>>> >>>>> Node 2: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Arbiter: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Here is our gluster volume status: >>>>> >>>>> # gluster volume status >>>>> Status of volume: gvAA01 >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck1 49152 0 Y >>>>> 6931 >>>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck2 49153 0 Y >>>>> 6939 >>>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck3 49154 0 Y >>>>> 6947 >>>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck4 49155 0 Y >>>>> 6956 >>>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck5 49156 0 Y >>>>> 6964 >>>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck6 49157 0 Y >>>>> 6974 >>>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck7 49158 0 Y >>>>> 6984 >>>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck8 49159 0 Y >>>>> 6993 >>>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck9 49160 0 Y >>>>> 7001 >>>>> NFS Server on localhost 2049 0 Y >>>>> 17276 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 25245 >>>>> NFS Server on 02-B 2049 0 Y 9089 >>>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>>> NFS Server on 00-a 2049 0 Y 15660 >>>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>>> >>>>> Task Status of Volume gvAA01 >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> And gluster volume info: >>>>> >>>>> # gluster volume info >>>>> >>>>> Volume Name: gvAA01 >>>>> Type: Distributed-Replicate >>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 9 x (2 + 1) = 27 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: 01-B:/brick1/gvAA01/brick >>>>> Brick2: 02-B:/brick1/gvAA01/brick >>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>>> Brick4: 01-B:/brick2/gvAA01/brick >>>>> Brick5: 02-B:/brick2/gvAA01/brick >>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>>> Brick7: 01-B:/brick3/gvAA01/brick >>>>> Brick8: 02-B:/brick3/gvAA01/brick >>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>>> Brick10: 01-B:/brick4/gvAA01/brick >>>>> Brick11: 02-B:/brick4/gvAA01/brick >>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>>> Brick13: 01-B:/brick5/gvAA01/brick >>>>> Brick14: 02-B:/brick5/gvAA01/brick >>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>>> Brick16: 01-B:/brick6/gvAA01/brick >>>>> Brick17: 02-B:/brick6/gvAA01/brick >>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>>> Brick19: 01-B:/brick7/gvAA01/brick >>>>> Brick20: 02-B:/brick7/gvAA01/brick >>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>>> Brick22: 01-B:/brick8/gvAA01/brick >>>>> Brick23: 02-B:/brick8/gvAA01/brick >>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>>> Brick25: 01-B:/brick9/gvAA01/brick >>>>> Brick26: 02-B:/brick9/gvAA01/brick >>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>>> Options Reconfigured: >>>>> cluster.shd-max-threads: 4 >>>>> performance.least-prio-threads: 16 >>>>> cluster.readdir-optimize: on >>>>> performance.quick-read: off >>>>> performance.stat-prefetch: off >>>>> cluster.data-self-heal: on >>>>> cluster.lookup-unhashed: auto >>>>> cluster.lookup-optimize: on >>>>> cluster.favorite-child-policy: mtime >>>>> server.allow-insecure: on >>>>> transport.address-family: inet >>>>> client.bind-insecure: on >>>>> cluster.entry-self-heal: off >>>>> cluster.metadata-self-heal: off >>>>> performance.md-cache-timeout: 600 >>>>> cluster.self-heal-daemon: enable >>>>> performance.readdir-ahead: on >>>>> diagnostics.brick-log-level: INFO >>>>> nfs.disable: off >>>>> >>>>> Thank you for any assistance. >>>>> >>>>> - Patrick >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>> >>>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
