Re: [Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync

Felix Kölzow Mon, 05 Oct 2020 13:38:17 -0700

Dear Matthew,


from my current experience with gluster geo-replication and since this
is a key in your backup-procedure

(this is how it seems to me), I would come up with a new one, just to be
sure.


Regards,

Felix


On 05/10/2020 22:28, Matthew Benstead wrote:

Hmm... Looks like I forgot to set the xattr's to sa - I left them as
default.

[root@pcic-backup01 ~]# zfs get xattr pcic-backup01-zpool
NAME                 PROPERTY  VALUE  SOURCE
pcic-backup01-zpool  xattr     on     default

[root@pcic-backup02 ~]# zfs get xattr pcic-backup02-zpool
NAME                 PROPERTY  VALUE  SOURCE
pcic-backup02-zpool  xattr     on     default

I wonder if I can change them and continue, or if I need to blow away
the zpool and start over?

Thanks,
 -Matthew

--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium <https://pacificclimate.org/>
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matth...@uvic.ca

On 10/5/20 12:53 PM, Felix Kölzow wrote:


Dear Matthew,

this is our configuration:

zfs get all mypool

mypool  xattr sa                              local
mypool  acltype posixacl                        local


Something more to consider?


Regards,

Felix



On 05/10/2020 21:11, Matthew Benstead wrote:

Thanks Felix - looking through some more of the logs I may have
found the reason...

From
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/mnt-data-storage_a-storage.log

[2020-10-05 18:13:35.736838] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:18:53.885591] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:22:14.405234] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:25:53.971679] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:31:44.571557] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:36:36.508772] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:40:10.401055] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:42:57.833536] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:45:19.691953] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:48:26.478532] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage
[2020-10-05 18:52:24.466914] E [fuse-bridge.c:4288:fuse_xattr_cbk]
0-glusterfs-fuse: extended attribute not supported by the backend
storage


The slave nodes are running gluster on top of ZFS, but I had
configured ACLs - is there something else missing to make this work
with ZFS?

[root@pcic-backup01 ~]# gluster volume info

Volume Name: pcic-backup
Type: Distribute
Volume ID: 7af8a424-f4b6-4405-bba1-0dbafb0fa231
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.0.231.81:/pcic-backup01-zpool/brick
Brick2: 10.0.231.82:/pcic-backup02-zpool/brick
Options Reconfigured:
network.ping-timeout: 10
performance.cache-size: 256MB
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.parallel-readdir: on
performance.readdir-ahead: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
nfs.disable: on
features.read-only: off
performance.open-behind: off


[root@pcic-backup01 ~]# zfs get acltype pcic-backup01-zpool
NAME                 PROPERTY  VALUE     SOURCE
pcic-backup01-zpool  acltype   posixacl  local

[root@pcic-backup01 ~]# grep "pcic-backup0" /proc/mounts
pcic-backup01-zpool /pcic-backup01-zpool zfs
rw,seclabel,xattr,posixacl 0 0


[root@pcic-backup02 ~]# zfs get acltype pcic-backup02-zpool
NAME                 PROPERTY  VALUE     SOURCE
pcic-backup02-zpool  acltype   posixacl  local

[root@pcic-backup02 ~]# grep "pcic-backup0" /proc/mounts
pcic-backup02-zpool /pcic-backup02-zpool zfs
rw,seclabel,xattr,posixacl 0 0

Thanks,
 -Matthew


--
Matthew Benstead
System Administrator
Pacific Climate Impacts Consortium <https://pacificclimate.org/>
University of Victoria, UH1
PO Box 1800, STN CSC
Victoria, BC, V8W 2Y2
Phone: +1-250-721-8432
Email: matth...@uvic.ca

On 10/5/20 1:39 AM, Felix Kölzow wrote:

Dear Matthew,


can you provide more information regarding to the geo-replication
brick
logs.

These files area also located in:

/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/


Usually, these log files are more precise to figure out the root cause
of the error.

Additionally, it is also worth to look at the log-files on the
slave side.


Regards,

Felix


On 01/10/2020 23:08, Matthew Benstead wrote:

Hello,

I'm looking for some help with a GeoReplication Error in my Gluster
7/CentOS 7 setup. Replication progress has basically stopped, and the
status of the replication keeps switching.

The gsyncd log has errors like "Operation not permitted", "incomplete
sync", etc... help? I'm not sure how to proceed in troubleshooting
this.

The log is here, it basically just repeats - from:
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log


[2020-10-01 20:52:15.291923] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
duration=32.8466        num_files=1749  job=3 return_code=23
[2020-10-01 20:52:18.700062] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
duration=43.1210        num_files=3167  job=6 return_code=23
[2020-10-01 20:52:23.383234] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs files=['XSYNC-CHANGELOG.1601585397']
[2020-10-01 20:52:28.537657] E [repce(worker
/data/storage_b/storage):213:__call__] RepceClient: call failed
call=258187:140538843596608:1601585515.63 method=entry_ops
error=OSError
[2020-10-01 20:52:28.538064] E [syncdutils(worker
/data/storage_b/storage):339:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
332,
in main
     func(args)
   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py",
line 86,
in subcmd_worker
     local.service_loop(remote)
   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1308, in service_loop
     g1.crawlwrap(oneshot=True, register_time=register_time)
   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
602,
in crawlwrap
     self.crawl()
   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1682,
in crawl
     self.process([item[1]], 0)
   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1327,
in process
     self.process_change(change, done, retry)
   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1221,
in process_change
     failures = self.slave.server.entry_ops(entries)
   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
232, in
__call__
     return self.ins(self.meth, *a)
   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
214, in
__call__
     raise res
OSError: [Errno 1] Operation not permitted
[2020-10-01 20:52:28.570316] I [repce(agent
/data/storage_b/storage):96:service_loop] RepceServer: terminating on
reaching EOF.
[2020-10-01 20:52:28.613603] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-10-01 20:52:29.619797] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
duration=5.6458 num_files=455   job=3   return_code=23
[2020-10-01 20:52:38.286245] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
duration=14.1824        num_files=1333  job=2 return_code=23
[2020-10-01 20:52:38.628156] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Initializing...
[2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor]
Monitor:
starting gsyncd worker   brick=/data/storage_b/storage
slave_node=10.0.231.82
[2020-10-01 20:52:38.684736] I [gsyncd(agent
/data/storage_b/storage):318:main] <top>: Using session config
file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf

[2020-10-01 20:52:38.687213] I [gsyncd(worker
/data/storage_b/storage):318:main] <top>: Using session config
file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf

[2020-10-01 20:52:38.687401] I [changelogagent(agent
/data/storage_b/storage):72:__init__] ChangelogAgent: Agent
listining...
[2020-10-01 20:52:38.703295] I [resource(worker
/data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2020-10-01 20:52:40.388372] I [resource(worker
/data/storage_b/storage):1435:connect_remote] SSH: SSH connection
between master and slave established. duration=1.6849
[2020-10-01 20:52:40.388582] I [resource(worker
/data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster
volume
locally...
[2020-10-01 20:52:41.501105] I [resource(worker
/data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster
volume
duration=1.1123
[2020-10-01 20:52:41.501405] I [subcmds(worker
/data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2020-10-01 20:52:43.531146] I [master(worker
/data/storage_b/storage):1640:register] _GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage

[2020-10-01 20:52:43.533953] I [resource(worker
/data/storage_b/storage):1291:service_loop] GLUSTER: Register time
time=1601585563
[2020-10-01 20:52:43.547092] I [gsyncdstatus(worker
/data/storage_b/storage):281:set_active] GeorepStatus: Worker Status
Change status=Active
[2020-10-01 20:52:43.561920] I [gsyncdstatus(worker
/data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
Crawl Status Change     status=History Crawl
[2020-10-01 20:52:43.562184] I [master(worker
/data/storage_b/storage):1554:crawl] _GMaster: starting history
crawl     turns=1 stime=None      entry_stime=None etime=1601585563
[2020-10-01 20:52:43.562269] I [resource(worker
/data/storage_b/storage):1307:service_loop] GLUSTER: No stime
available,
using xsync crawl
[2020-10-01 20:52:43.569799] I [master(worker
/data/storage_b/storage):1670:crawl] _GMaster: starting hybrid
crawl      stime=None
[2020-10-01 20:52:43.573528] I [gsyncdstatus(worker
/data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
Crawl Status Change     status=Hybrid Crawl
[2020-10-01 20:52:44.370985] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
duration=20.4307        num_files=2609  job=5 return_code=23
[2020-10-01 20:52:49.431854] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs files=['XSYNC-CHANGELOG.1601585397']
[2020-10-01 20:52:54.801500] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
duration=72.7492        num_files=4227  job=2 return_code=23
[2020-10-01 20:52:56.766547] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
duration=74.3569        num_files=4674  job=5 return_code=23
[2020-10-01 20:53:18.853333] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
duration=28.7125        num_files=4397  job=3 return_code=23
[2020-10-01 20:53:21.224921] W [master(worker
/data/storage_a/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs     files=['CHANGELOG.1601044033',
'CHANGELOG.1601044048', 'CHANGELOG.1601044063',
'CHANGELOG.1601044078',
'CHANGELOG.1601044093', 'CHANGELOG.1601044108',
'CHANGELOG.1601044123']
[2020-10-01 20:53:22.134536] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
duration=0.2159 num_files=3     job=3   return_code=23
[2020-10-01 20:53:25.615712] I [master(worker
/data/storage_b/storage):1681:crawl] _GMaster: processing xsync
changelog
path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563

[2020-10-01 20:53:25.634970] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs files=['XSYNC-CHANGELOG.1601585397']

GeoReplication status - see it change from Active to Faulty:

[root@storage01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER
SLAVE                                        SLAVE NODE     STATUS
CRAWL STATUS       LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

10.0.231.91    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Changelog Crawl    2020-09-25 07:26:57
10.0.231.91    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
Hybrid Crawl       N/A
10.0.231.91    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
Hybrid Crawl       N/A
10.0.231.92    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
History Crawl      2020-09-23 01:56:05
10.0.231.92    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
Hybrid Crawl       N/A
10.0.231.92    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Hybrid Crawl       N/A
10.0.231.93    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Changelog Crawl    2020-09-25 06:55:57
10.0.231.93    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Hybrid Crawl       N/A
10.0.231.93    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Hybrid Crawl       N/A

[root@storage01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK SLAVE USER
SLAVE                                        SLAVE NODE     STATUS
CRAWL STATUS       LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

10.0.231.91    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Changelog Crawl    2020-09-25 07:26:57
10.0.231.91    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
Hybrid Crawl       N/A
10.0.231.91    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
N/A                N/A
10.0.231.92    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
History Crawl      2020-09-23 01:58:05
10.0.231.92    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
Hybrid Crawl       N/A
10.0.231.92    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
N/A                N/A
10.0.231.93    storage       /data/storage_c/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Changelog Crawl    2020-09-25 06:58:56
10.0.231.93    storage       /data/storage_b/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
Hybrid Crawl       N/A
10.0.231.93    storage       /data/storage_a/storage geoaccount
ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
N/A                N/A


Cluster information: (Note - disabled performance.open-behind to work
around https://github.com/gluster/glusterfs/issues/1440 )

[root@storage01 ~]# gluster --version | head -1; cat
/etc/centos-release; uname -r
glusterfs 7.7
CentOS Linux release 7.8.2003 (Core)
3.10.0-1127.10.1.el7.x86_64

[root@storage01 ~]# df -h /storage2/
Filesystem            Size  Used Avail Use% Mounted on
10.0.231.91:/storage  328T  228T  100T  70% /storage2

[root@storage01 ~]# gluster volume info

Volume Name: storage
Type: Distributed-Replicate
Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.0.231.91:/data/storage_a/storage
Brick2: 10.0.231.92:/data/storage_b/storage
Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
Brick4: 10.0.231.92:/data/storage_a/storage
Brick5: 10.0.231.93:/data/storage_b/storage
Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
Brick7: 10.0.231.93:/data/storage_a/storage
Brick8: 10.0.231.91:/data/storage_b/storage
Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
network.ping-timeout: 10
features.inode-quota: on
features.quota: on
nfs.disable: on
features.quota-deem-statfs: on
storage.fips-mode-rchecksum: on
performance.readdir-ahead: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 256MB
performance.open-behind: off

Thanks,
  -Matthew
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync

Reply via email to