Re: [Gluster-users] Extremely slow cluster performance

Patrick Rennie Sun, 21 Apr 2019 02:42:04 -0700

Another small update from me, I have been keeping an eye on the
glustershd.log file to see what is going on and I keep seeing the same file
names come up in there every 10 minutes, but not a lot of other activity.
Logs below.
How can I be sure my heal is progressing through the files which actually
need to be healed? I thought it would show up in these logs.
I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed
things up too.


Any ideas here?

Thanks,

- Patrick

On 01-B
-------
[2019-04-21 09:12:54.575689] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
5354c112-2e58-451d-a6f7-6bfcc1c9d904
[2019-04-21 09:12:54.733601] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904.
sources=[0] 2  sinks=1
[2019-04-21 09:13:12.028509] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:12.047470] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:23:13.044377] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:13.051479] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:33:07.400369] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2  sinks=1
[2019-04-21 09:33:11.825449] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
2fd9899f-192b-49cb-ae9c-df35d3f004fa
[2019-04-21 09:33:14.029837] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:14.037436] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:33:23.913882] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2  sinks=1
[2019-04-21 09:33:43.874201] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
c25b80fd-f7df-4c6d-92bd-db930e89a0b1
[2019-04-21 09:34:02.273898] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1.
sources=[0] 2  sinks=1
[2019-04-21 09:35:12.282045] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2  sinks=1
[2019-04-21 09:35:15.146252] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
94027f22-a7d7-4827-be0d-09cf5ddda885
[2019-04-21 09:35:15.254538] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2  sinks=1
[2019-04-21 09:35:22.900803] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2  sinks=1
[2019-04-21 09:35:27.150963] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
84c93069-cfd8-441b-a6e8-958bed535b45
[2019-04-21 09:35:29.186295] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2  sinks=1
[2019-04-21 09:35:35.967451] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2  sinks=1
[2019-04-21 09:35:40.733444] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
e747c32e-4353-4173-9024-855c69cdf9b9
[2019-04-21 09:35:58.707593] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2  sinks=1
[2019-04-21 09:36:25.554260] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2  sinks=1
[2019-04-21 09:36:26.031422] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
4758d581-9de0-403b-af8b-bfd3d71d020d
[2019-04-21 09:36:26.083982] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2  sinks=1

On 02-B
-------
[2019-04-21 09:03:15.815250] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:03:15.863153] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:15.867432] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:03:15.875134] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:39.020198] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:03:39.027345] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:13:18.524874] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:13:20.070172] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:20.074977] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:13:20.080827] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:40.015763] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:40.021805] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:23:21.991032] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:23:22.054565] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:22.059225] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:23:22.066266] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:41.129962] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:41.135919] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:33:24.015223] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:33:24.069686] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:24.074341] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:33:24.080065] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:42.099515] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:42.107481] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17


On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie <[email protected]>
wrote:

> Just another small update, I'm continuing to watch my brick logs and I
> just saw these errors come up in the recent events too. I am going to
> continue to post any errors I see in the hope of finding the right one to
> try and fix..
> This is from the logs on brick1, seems to be occurring on both nodes on
> brick1, although at different times. I'm not sure what this means, can
> anyone shed any light?
> I guess I am looking for some kind of specific error which may indicate
> something is broken or stuck and locking up and causing the extreme latency
> I'm seeing in the cluster.
>
> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a)
> [0x7f3b3e93158a]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45)
> [0x7f3b3e4c5d45]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>
> Thanks again,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <[email protected]>
> wrote:
>
>> Hi Darrell,
>>
>> Thanks again for your advice, I've left it for a while but unfortunately
>> it's still just as slow and causing more problems for our operations now. I
>> will need to try and take some steps to at least bring performance back to
>> normal while continuing to investigate the issue longer term. I can
>> definitely see one node with heavier CPU than the other, almost double,
>> which I am OK with, but I think the heal process is going to take forever,
>> trying to check the "gluster volume heal info" shows thousands and
>> thousands of files which may need healing, I have no idea how many in total
>> the command is still running after hours, so I am not sure what has gone so
>> wrong to cause this.
>>
>> I've checked cluster.op-version and cluster.max-op-version and it looks
>> like I'm on the latest version there.
>>
>> I have no idea how long the healing is going to take on this cluster, we
>> have around 560TB of data on here, but I don't think I can wait that long
>> to try and restore performance to normal.
>>
>> Can anyone think of anything else I can try in the meantime to work out
>> what's causing the extreme latency?
>>
>> I've been going through cluster client the logs of some of our VMs and on
>> some of our FTP servers I found this in the cluster mount log, but I am not
>> seeing it on any of our other servers, just our FTP servers.
>>
>> [2019-04-21 07:16:19.925388] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:19:43.413834] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:19:43.414153] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:23:33.154717] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:33:24.943913] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>>
>> Any ideas what this could mean? I am basically just grasping at straws
>> here.
>>
>> I am going to hold off on the version upgrade until I know there are no
>> files which need healing, which could be a while, from some reading I've
>> done there shouldn't be any issues with this as both are on v3.12.x
>>
>> I've free'd up a small amount of space, but I still need to work on this
>> further.
>>
>> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;"
>> which could be run on each brick and it would potentially clean up any
>> files which were deleted straight from the bricks, but not via the client,
>> I have a feeling this could help me free up about 5-10TB per brick from
>> what I've been told about the history of this cluster. Can anyone confirm
>> if this is actually safe to run?
>>
>> At this stage, I'm open to any suggestions as to how to proceed, thanks
>> again for any advice.
>>
>> Cheers,
>>
>> - Patrick
>>
>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <[email protected]>
>> wrote:
>>
>>> Patrick,
>>>
>>> Sounds like progress. Be aware that gluster is expected to max out the
>>> CPUs on at least one of your servers while healing. This is normal and
>>> won’t adversely affect overall performance (any more than having bricks in
>>> need of healing, at any rate) unless you’re overdoing it. shd threads <= 4
>>> should not do that on your hardware. Other tunings may have also increased
>>> overall performance, so you may see higher CPU than previously anyway. I’d
>>> recommend upping those thread counts and letting it heal as fast as
>>> possible, especially if these are dedicated Gluster storage servers (Ie:
>>> not also running VMs, etc). You should see “normal” CPU use one heals are
>>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20
>>> cores). It’s also likely to be different between your servers, in a pure
>>> replica, one tends to max and one tends to be a little higher, in a
>>> distributed-replica, I’d expect more than one to run harder while healing.
>>>
>>> Keep the differences between doing an ls on a brick and doing an ls on a
>>> gluster mount in mind. When you do a ls on a gluster volume, it isn’t just
>>> doing a ls on one brick, it’s effectively doing it on ALL of your bricks,
>>> and they all have to return data before the ls succeeds. In a distributed
>>> volume, it’s figuring out where on each volume things live and getting the
>>> stat() from each to assemble the whole thing. And if things are in need of
>>> healing, it will take even longer to decide which version is current and
>>> use it (shd triggers a heal anytime it encounters this). Any of these
>>> things being slow slows down the overall response.
>>>
>>> At this point, I’d get some sleep too, and let your cluster heal while
>>> you do. I’d really want it fully healed before I did any updates anyway, so
>>> let it use CPU and get itself sorted out. Expect it to do a round of
>>> healing after you upgrade each machine too, this is normal so don’t let the
>>> CPU spike surprise you, It’s just catching up from the downtime incurred by
>>> the update and/or reboot if you did one.
>>>
>>> That reminds me, check your gluster cluster.op-version and
>>> cluster.max-op-version (gluster vol get all all | grep op-version). If
>>> op-version isn’t at the max-op-verison, set it to it so you’re taking
>>> advantage of the latest features available to your version.
>>>
>>>   -Darrell
>>>
>>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <[email protected]>
>>> wrote:
>>>
>>> Hi Darrell,
>>>
>>> Thanks again for your advice, I've applied the acltype=posixacl on my
>>> zpools and I think that has reduced some of the noise from my brick logs.
>>> I also bumped up some of the thread counts you suggested but my CPU load
>>> skyrocketed, so I dropped it back down to something slightly lower, but
>>> still higher than it was before, and will see how that goes for a while.
>>>
>>> Although low space is a definite issue, if I run an ls anywhere on my
>>> bricks directly it's instant, <1 second, and still takes several minutes
>>> via gluster, so there is still a problem in my gluster configuration
>>> somewhere. We don't have any snapshots, but I am trying to work out if any
>>> data on there is safe to delete, or if there is any way I can safely find
>>> and delete data which has been removed directly from the bricks in the
>>> past. I also have lz4 compression already enabled on each zpool which does
>>> help a bit, we get between 1.05 and 1.08x compression on this data.
>>> I've tried to go through each client and checked it's cluster mount logs
>>> and also my brick logs and looking for errors, so far nothing is jumping
>>> out at me, but there are some warnings and errors here and there, I am
>>> trying to work out what they mean.
>>>
>>> It's already 1 am here and unfortunately, I'm still awake working on
>>> this issue, but I think that I will have to leave the version upgrades
>>> until tomorrow.
>>>
>>> Thanks again for your advice so far. If anyone has any ideas on where I
>>> can look for errors other than brick logs or the cluster mount logs to help
>>> resolve this issue, it would be much appreciated.
>>>
>>> Cheers,
>>>
>>> - Patrick
>>>
>>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <[email protected]>
>>> wrote:
>>>
>>>> See inline:
>>>>
>>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <[email protected]>
>>>> wrote:
>>>>
>>>> Hi Darrell,
>>>>
>>>> Thanks for your reply, this issue seems to be getting worse over the
>>>> last few days, really has me tearing my hair out. I will do as you have
>>>> suggested and get started on upgrading from 3.12.14 to 3.12.15.
>>>> I've checked the zfs properties and all bricks have "xattr=sa" set, but
>>>> none of them has "acltype=posixacl" set, currently the acltype property
>>>> shows "off", if I make these changes will it apply retroactively to the
>>>> existing data? I'm unfamiliar with what this will change so I may need to
>>>> look into that before I proceed.
>>>>
>>>>
>>>> It is safe to apply that now, any new set/get calls will then use it if
>>>> new posixacls exist, and use older if not. ZFS is good that way. It should
>>>> clear up your posix_acl and posix errors over time.
>>>>
>>>> I understand performance is going to slow down as the bricks get full,
>>>> I am currently trying to free space and migrate data to some newer storage,
>>>> I have fresh several hundred TB storage I just setup recently but with
>>>> these performance issues it's really slow. I also believe there is
>>>> significant data which has been deleted directly from the bricks in the
>>>> past, so if I can reclaim this space in a safe manner then I will have at
>>>> least around 10-15% free space.
>>>>
>>>>
>>>> Full ZFS volumes will have a much larger impact on performance than
>>>> you’d think, I’d prioritize this. If you have been taking zfs snapshots,
>>>> consider deleting them to get the overall volume free space back up. And
>>>> just to be sure it’s been said, delete from within the mounted volumes,
>>>> don’t delete directly from the bricks (gluster will just try and heal it
>>>> later, compounding your issues). Does not apply to deleting other data from
>>>> the ZFS volume if it’s not part of the brick directory, of course.
>>>>
>>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so
>>>> generally they have plenty of resources available, currently only using
>>>> around 330/512GB of memory.
>>>>
>>>> I will look into what your suggested settings will change, and then
>>>> will probably go ahead with your recommendations, for our specs as stated
>>>> above, what would you suggest for performance.io-thread-count ?
>>>>
>>>>
>>>> I run single 2630v4s on my servers, which have a smaller storage
>>>> footprint than yours. I’d go with 32 for performance.io-thread-count.
>>>> I’d try 4 for the shd thread settings on that gear. Your memory use sounds
>>>> fine, so no worries there.
>>>>
>>>> Our workload is nothing too extreme, we have a few VMs which write
>>>> backup data to this storage nightly for our clients, our VMs don't live on
>>>> this cluster, but just write to it.
>>>>
>>>>
>>>> If they are writing compressible data, you’ll get immediate benefit by
>>>> setting compression=lz4 on your ZFS volumes. It won’t help any old data, of
>>>> course, but it will compress new data going forward. This is another one
>>>> that’s safe to enable on the fly.
>>>>
>>>> I've been going through all of the logs I can, below are some slightly
>>>> sanitized errors I've come across, but I'm not sure what to make of them.
>>>> The main error I am seeing is the first one below, across several of my
>>>> bricks, but possibly only for specific folders on the cluster, I'm not 100%
>>>> about that yet though.
>>>>
>>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>>>> supported]
>>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>>>> supported]
>>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>>>> supported]
>>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>>>> supported]
>>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>>>> supported]
>>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr]
>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick
>>>> with 'user_xattr' flag)
>>>>
>>>>
>>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]
>>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available]
>>>>
>>>>
>>>> posixacls should clear those up, as mentioned.
>>>>
>>>>
>>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock]
>>>> 0-gvAA01-locks:  Matching lock not found for unlock 0-9223372036854775807,
>>>> by 980fdbbd367f0000 on 0x7fc4f0161440
>>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053]
>>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928:
>>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8),
>>>> client:
>>>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4,
>>>> error-xlator: gvAA01-locks [Invalid argument]
>>>>
>>>>
>>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>>>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS
>>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server)
>>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply]
>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a)
>>>> [0x7ff4ae6f796a]
>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8)
>>>> [0x7ff4ae2a96e8]
>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d)
>>>> [0x7ff4ae28528d] ) 0-: Reply submission failed
>>>>
>>>>
>>>> Fix the posix acls and see if these clear up over time as well, I’m
>>>> unclear on what the overall effect of running without the posix acls will
>>>> be to total gluster health. Your biggest problem sounds like you need to
>>>> free up space on the volumes and get the overall volume health back up to
>>>> par and see if that doesn’t resolve the symptoms you’re seeing.
>>>>
>>>>
>>>>
>>>> Thank you again for your assistance. It is greatly appreciated.
>>>>
>>>> - Patrick
>>>>
>>>>
>>>>
>>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <[email protected]>
>>>> wrote:
>>>>
>>>>> Patrick,
>>>>>
>>>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You
>>>>> also mention ZFS, and that error you show makes me think you need to check
>>>>> to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS
>>>>> volumes.
>>>>>
>>>>> You also observed your bricks are crossing the 95% full line, ZFS
>>>>> performance will degrade significantly the closer you get to full. In my
>>>>> experience, this starts somewhere between 10% and 5% free space remaining,
>>>>> so you’re in that realm.
>>>>>
>>>>> How’s your free memory on the servers doing? Do you have your zfs arc
>>>>> cache limited to something less than all the RAM? It shares pretty well,
>>>>> but I’ve encountered situations where other things won’t try and take ram
>>>>> back properly if they think it’s in use, so ZFS never gets the opportunity
>>>>> to give it up.
>>>>>
>>>>> Since your volume is a disperse-replica, you might try tuning
>>>>> disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more 
>>>>> if
>>>>> the CPUs are beefy enough. And setting server.event-threads to 4 and
>>>>> client.event-threads to 8 has proven helpful in many cases. After you get
>>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. 
>>>>> I
>>>>> don’t know if it matters, but I’d also recommend resetting
>>>>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or
>>>>> also setting performance.io-thread-count to 32 if those have beefy
>>>>> CPUs.
>>>>>
>>>>> Beyond those general ideas, more info about your hardware (CPU and
>>>>> RAM) and workload (VMs, direct storage for web servers or enders, etc) may
>>>>> net you some more ideas. Then you’re going to have to do more digging into
>>>>> brick logs looking for errors and/or warnings to see what’s going on.
>>>>>
>>>>>   -Darrell
>>>>>
>>>>>
>>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hello Gluster Users,
>>>>>
>>>>> I am hoping someone can help me with resolving an ongoing issue I've
>>>>> been having, I'm new to mailing lists so forgive me if I have gotten
>>>>> anything wrong. We have noticed our performance deteriorating over the 
>>>>> last
>>>>> few weeks, easily measured by trying to do an ls on one of our top-level
>>>>> folders, and timing it, which usually would take 2-5 seconds, and now 
>>>>> takes
>>>>> up to 20 minutes, which obviously renders our cluster basically unusable.
>>>>> This has been intermittent in the past but is now almost constant and I am
>>>>> not sure how to work out the exact cause. We have noticed some errors in
>>>>> the brick logs, and have noticed that if we kill the right brick process,
>>>>> performance instantly returns back to normal, this is not always the same
>>>>> brick, but it indicates to me something in the brick processes or
>>>>> background tasks may be causing extreme latency. Due to this ability to 
>>>>> fix
>>>>> it by killing the right brick process off, I think it's a specific file, 
>>>>> or
>>>>> folder, or operation which may be hanging and causing the increased
>>>>> latency, but I am not sure how to work it out. One last thing to add is
>>>>> that our bricks are getting quite full (~95% full), we are trying to
>>>>> migrate data off to new storage but that is going slowly, not helped by
>>>>> this issue. I am currently trying to run a full heal as there appear to be
>>>>> many files needing healing, and I have all brick processes running so they
>>>>> have an opportunity to heal, but this means performance is very poor. It
>>>>> currently takes over 15-20 minutes to do an ls of one of our top-level
>>>>> folders, which just contains 60-80 other folders, this should take 2-5
>>>>> seconds. This is all being checked by FUSE mount locally on the storage
>>>>> node itself, but it is the same for other clients and VMs accessing the
>>>>> cluster. Initially, it seemed our NFS mounts were not affected and 
>>>>> operated
>>>>> at normal speed, but testing over the last day has shown that our NFS
>>>>> clients are also extremely slow, so it doesn't seem specific to FUSE as I
>>>>> first thought it might be.
>>>>>
>>>>> I am not sure how to proceed from here, I am fairly new to gluster
>>>>> having inherited this setup from my predecessor and trying to keep it
>>>>> going. I have included some info below to try and help with diagnosis,
>>>>> please let me know if any further info would be helpful. I would really
>>>>> appreciate any advice on what I could try to work out the cause. Thank you
>>>>> in advance for reading this, and any suggestions you might be able to
>>>>> offer.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> This is an example of the main error I see in our brick logs, there
>>>>> have been others, I can post them when I see them again too:
>>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001]
>>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>>>>> /brick1/<filename> library: system.posix_acl_default  [Operation not
>>>>> supported]
>>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr]
>>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick
>>>>> with 'user_xattr' flag)
>>>>>
>>>>> Our setup consists of 2 storage nodes and an arbiter node. I have
>>>>> noticed our nodes are on slightly different versions, I'm not sure if this
>>>>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2
>>>>> pools - total capacity is around 560TB.
>>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth
>>>>> with iperf and found that it's what would be expected from this config.
>>>>> Individual brick performance seems ok, I've tested several bricks
>>>>> using dd and can write a 10GB files at 1.7GB/s.
>>>>>
>>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000
>>>>> 10000+0 records in
>>>>> 10000+0 records out
>>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s
>>>>>
>>>>> Node 1:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.15
>>>>>
>>>>> Node 2:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Arbiter:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Here is our gluster volume status:
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: gvAA01
>>>>> Gluster process                             TCP Port  RDMA Port
>>>>> Online  Pid
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Brick 01-B:/brick1/gvAA01/brick    49152     0          Y       7219
>>>>> Brick 02-B:/brick1/gvAA01/brick    49152     0          Y       21845
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck1                                         49152     0          Y
>>>>>    6931
>>>>> Brick 01-B:/brick2/gvAA01/brick    49153     0          Y       7239
>>>>> Brick 02-B:/brick2/gvAA01/brick    49153     0          Y       9916
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck2                                         49153     0          Y
>>>>>    6939
>>>>> Brick 01-B:/brick3/gvAA01/brick    49154     0          Y       7235
>>>>> Brick 02-B:/brick3/gvAA01/brick    49154     0          Y       21858
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck3                                         49154     0          Y
>>>>>    6947
>>>>> Brick 01-B:/brick4/gvAA01/brick    49155     0          Y       31840
>>>>> Brick 02-B:/brick4/gvAA01/brick    49155     0          Y       9933
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck4                                         49155     0          Y
>>>>>    6956
>>>>> Brick 01-B:/brick5/gvAA01/brick    49156     0          Y       7233
>>>>> Brick 02-B:/brick5/gvAA01/brick    49156     0          Y       9942
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck5                                         49156     0          Y
>>>>>    6964
>>>>> Brick 01-B:/brick6/gvAA01/brick    49157     0          Y       7234
>>>>> Brick 02-B:/brick6/gvAA01/brick    49157     0          Y       9952
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck6                                         49157     0          Y
>>>>>    6974
>>>>> Brick 01-B:/brick7/gvAA01/brick    49158     0          Y       7248
>>>>> Brick 02-B:/brick7/gvAA01/brick    49158     0          Y       9960
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck7                                         49158     0          Y
>>>>>    6984
>>>>> Brick 01-B:/brick8/gvAA01/brick    49159     0          Y       7253
>>>>> Brick 02-B:/brick8/gvAA01/brick    49159     0          Y       9970
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck8                                         49159     0          Y
>>>>>    6993
>>>>> Brick 01-B:/brick9/gvAA01/brick    49160     0          Y       7245
>>>>> Brick 02-B:/brick9/gvAA01/brick    49160     0          Y       9984
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck9                                         49160     0          Y
>>>>>    7001
>>>>> NFS Server on localhost                     2049      0          Y
>>>>>    17276
>>>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>>>>    25245
>>>>> NFS Server on 02-B                 2049      0          Y       9089
>>>>> Self-heal Daemon on 02-B           N/A       N/A        Y       17838
>>>>> NFS Server on 00-a                 2049      0          Y       15660
>>>>> Self-heal Daemon on 00-a           N/A       N/A        Y       16218
>>>>>
>>>>> Task Status of Volume gvAA01
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> And gluster volume info:
>>>>>
>>>>> # gluster volume info
>>>>>
>>>>> Volume Name: gvAA01
>>>>> Type: Distributed-Replicate
>>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 9 x (2 + 1) = 27
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 01-B:/brick1/gvAA01/brick
>>>>> Brick2: 02-B:/brick1/gvAA01/brick
>>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
>>>>> Brick4: 01-B:/brick2/gvAA01/brick
>>>>> Brick5: 02-B:/brick2/gvAA01/brick
>>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
>>>>> Brick7: 01-B:/brick3/gvAA01/brick
>>>>> Brick8: 02-B:/brick3/gvAA01/brick
>>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
>>>>> Brick10: 01-B:/brick4/gvAA01/brick
>>>>> Brick11: 02-B:/brick4/gvAA01/brick
>>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
>>>>> Brick13: 01-B:/brick5/gvAA01/brick
>>>>> Brick14: 02-B:/brick5/gvAA01/brick
>>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
>>>>> Brick16: 01-B:/brick6/gvAA01/brick
>>>>> Brick17: 02-B:/brick6/gvAA01/brick
>>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)
>>>>> Brick19: 01-B:/brick7/gvAA01/brick
>>>>> Brick20: 02-B:/brick7/gvAA01/brick
>>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)
>>>>> Brick22: 01-B:/brick8/gvAA01/brick
>>>>> Brick23: 02-B:/brick8/gvAA01/brick
>>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)
>>>>> Brick25: 01-B:/brick9/gvAA01/brick
>>>>> Brick26: 02-B:/brick9/gvAA01/brick
>>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)
>>>>> Options Reconfigured:
>>>>> cluster.shd-max-threads: 4
>>>>> performance.least-prio-threads: 16
>>>>> cluster.readdir-optimize: on
>>>>> performance.quick-read: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.data-self-heal: on
>>>>> cluster.lookup-unhashed: auto
>>>>> cluster.lookup-optimize: on
>>>>> cluster.favorite-child-policy: mtime
>>>>> server.allow-insecure: on
>>>>> transport.address-family: inet
>>>>> client.bind-insecure: on
>>>>> cluster.entry-self-heal: off
>>>>> cluster.metadata-self-heal: off
>>>>> performance.md-cache-timeout: 600
>>>>> cluster.self-heal-daemon: enable
>>>>> performance.readdir-ahead: on
>>>>> diagnostics.brick-log-level: INFO
>>>>> nfs.disable: off
>>>>>
>>>>> Thank you for any assistance.
>>>>>
>>>>> - Patrick
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> [email protected]
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Extremely slow cluster performance

Reply via email to