Re: [Gluster-users] Self-heals gone wild

Ravishankar N Wed, 09 Oct 2019 01:38:21 -0700


On 08/10/19 11:24 pm, Jamie Lawrence wrote:

Hello,


I recently stood up a 3x2 (soon to be 3x3) distribute-replicate volume on 5.9, 
running on Centos 7.7.

Volume Name: test_stage1_shared
Type: Distributed-Replicate
Volume ID: 99674d15-7dce-480e-b642-eaf7da72c1a1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: sc5-storage-1:/gluster-bricks/pool-1/test_stage1_shared
Brick2: sc5-storage-2:/gluster-bricks/pool-1/test_stage1_shared
Brick3: sc5-storage-3:/gluster-bricks/pool-1/test_stage1_shared
Brick4: sc5-storage-4:/gluster-bricks/pool-1/test_stage1_shared
Brick5: sc5-storage-5:/gluster-bricks/pool-1/test_stage1_shared
Brick6: sc5-storage-6:/gluster-bricks/pool-1/test_stage1_shared
Options Reconfigured:
features.quota-deem-statfs: on
nfs.ports-insecure: on
features.inode-quota: on
features.quota: on
server.allow-insecure: on
nfs.rpc-auth-allow: 
10.181.51.190,10.181.43.10,10.181.70.190,10.181.70.191,10.181.70.192
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: off


Moved data to it, and after a couple of days of letting it sit, came in this 
morning to a client /var filesystem full of:

[2019-10-08 17:34:31.249186] I 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-test_stage1_shared-replicate-1:  metadata self heal  is successfully 
completed,   metadata self heal from source test_stage1_shared-client-3 to 
test_stage1_shared-client-4,  test_stage1_shared-client-5,  metadata - Pending 
matrix:  [ [ 0 0 0 ] [ 0 0 0 ] [ 0 0 0 ] ], on /sftp/wi/wcs/inprogress
[2019-10-08 17:34:31.260064] I 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-test_stage1_shared-replicate-0:  metadata self heal  is successfully 
completed,   metadata self heal from source test_stage1_shared-client-0 to 
test_stage1_shared-client-1,  test_stage1_shared-client-2,  metadata - Pending 
matrix:  [ [ 0 0 0 ] [ 0 0 0 ] [ 0 0 0 ] ], on /sftp/wi/wcs/sdcom

It looks like your clients are running glusterfs-3.5 or older?afr_log_self_heal_completion_status() is a function that existed in thereally old replication code before it was refactored. Please use a newerclient, preferably the same version as that of your servers.


Hope that helps,

Ravi


On the server peers, I'm seeing things like:

[2019-10-06 12:30:04.636311] W [glusterd-locks.c:586:glusterd_mgmt_v3_lock] 
(-->/usr/lib64/glusterfs/5.9/xlator/mgmt/glusterd.so(+0xe5df0) [0x7f1a23e43df0] 
-->/usr/lib64/glusterfs/5.9/xlator/mgmt/glusterd.so(+0xe5d22) [0x7f1a23e43d22] 
-->/usr/lib64/glusterfs/5.9/xlator/mgmt/glusterd.so(+0xec2cd) [0x7f1a23e4a2cd] ) 
0-management: Lock for test_stage1_shared held by e9fc174c-4156-475e-9d07-36b6c85a364f

But nothing else looks that unusual in those logs. I tried running a `gluster v 
heal` manually, which generated a ton of log noise but apparently little else.

Has anyone seen this before, or have a hint? I'm new to version 5.9, and have 
not run in to this problem before.

Thanks,

-j

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-heals gone wild

Reply via email to