Re: [Gluster-users] Question about geo-replication and deletes in 3.5 beta train

CJ Beck Thu, 01 May 2014 10:53:18 -0700

Ok, I have found a way to get back to “ChangeLog”…  This might be related to 
the similar thread that we have going regarding the method for setting up the 
initial geo-replication session. Seems as though when geo-repliation is set up 
on my cluster, it tried to open the changelog fifo, but it wasn’t there.


In order to fix this, I had to do the following:


  *   Stop geo-replication
  *   Stop volume
  *   Start volume
  *   Change geo-replication “change_detector” to changelog
  *   Start geo-replication

Once I did that, it went to Hybrid mode first, then changed to ChangeLog mode.

-CJ

From: CJ Beck <[email protected]<mailto:[email protected]>>
Date: Thursday, May 1, 2014 at 10:28 AM
To: Venky Shankar <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] Question about geo-replication and deletes in 3.5 
beta train

I just noticed this, which might be related to the change to xsync?

[root@dev604 eafea2c974a3c29ecfbf48cea274dc23]# more changes.log
[2014-04-30 15:45:27.807181] I 
[gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs: connecting to 
changelog socket: 
/var/run/gluster/changelog-eafea2c974a3c29ecfbf48cea274dc23.sock (brick: 
/data/sac-poc)
[2014-04-30 15:45:27.807257] W 
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection 
attempt 1/5...
[2014-04-30 15:45:29.807404] W 
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection 
attempt 2/5...
[2014-04-30 15:45:31.807607] W 
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection 
attempt 3/5...
[2014-04-30 15:45:33.807818] W 
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection 
attempt 4/5...
[2014-04-30 15:45:35.808038] W 
[gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection 
attempt 5/5...
[2014-04-30 15:45:37.808239] E 
[gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could not 
connect to changelog socket! bailing out...

From: CJ Beck <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 30, 2014 at 2:50 PM
To: Venky Shankar <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] Question about geo-replication and deletes in 3.5 
beta train

I just got back to testing this, and for some reason on my “freshly” created 
cluster and geo-replication session, it’s defaulting to “Hybrid Mode”. It also 
keeps bouncing back to xsync as the change method (it seems).

Geo-replication log:
[root@dev604 gluster-poc]# egrep -i 'changelog|xsync' *
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:27.763072] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up xsync change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:27.765294] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up changelog change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:27.768302] I [master(/data/gluster-poc):1103:register] _GMaster: xsync 
temp directory: 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:37.808617] I [master(/data/gluster-poc):682:fallback_xsync] _GMaster: 
falling back to xsync mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:52.113879] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up xsync change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:52.116525] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up xsync change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:52.120129] I [master(/data/gluster-poc):1103:register] _GMaster: xsync 
temp directory: 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:52.120604] I [master(/data/gluster-poc):1103:register] _GMaster: xsync 
temp directory: 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:45:54.146847] I [master(/data/gluster-poc):1133:crawl] _GMaster: processing 
xsync changelog 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync/XSYNC-CHANGELOG.1398872752
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:47:08.204514] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up xsync change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:47:08.206767] I [master(/data/gluster-poc):58:gmaster_builder] <top>: 
setting up xsync change detection mode
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:47:08.210570] I [master(/data/gluster-poc):1103:register] _GMaster: xsync 
temp directory: 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:47:08.211069] I [master(/data/gluster-poc):1103:register] _GMaster: xsync 
temp directory: 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync
ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc.log:[2014-04-30
 15:47:09.247109] I [master(/data/gluster-poc):1133:crawl] _GMaster: processing 
xsync changelog 
/var/run/gluster/gluster-poc/ssh%3A%2F%2Froot%4010.10.10.120%3Agluster%3A%2F%2F127.0.0.1%3Agluster-poc/eafea2c974a3c29ecfbf48cea274dc23/xsync/XSYNC-CHANGELOG.1398872828


[root@dev604 gluster-poc]# gluster volume geo-replication gluster-poc 
10.10.10.120::gluster-poc  status detail

MASTER NODE               MASTER VOL    MASTER BRICK     SLAVE                  
   STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING 
   BYTES PENDING    DELETES PENDING    FILES SKIPPED
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dev604.domain.com    gluster-poc       /data/gluster-poc    
10.10.10.120::gluster-poc    Active     N/A                  Hybrid Crawl    0  
            323              0                0                  0
dev606.domain.com    gluster-poc       /data/gluster-poc    
10.10.10.122::gluster-poc    Passive    N/A                  N/A             0  
            0                0                0                  0
dev605.domain.com    gluster-poc       /data/gluster-poc    
10.10.10.121::gluster-poc    Passive    N/A                  N/A             0  
            0                0                0                  0



From: Venky Shankar <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 23, 2014 at 12:09 PM
To: CJ Beck <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] Question about geo-replication and deletes in 3.5 
beta train

That should not happen. After a replica failover the "now" active node should 
continue where the "old" active node left off.

Could you provide geo-replication logs from master and slave after reproducing 
this (with changelog mode).

Thanks,
-venky


On Thu, Apr 17, 2014 at 9:34 PM, CJ Beck 
<[email protected]<mailto:[email protected]>> wrote:
I did set it intentionally because I found a case where files would be missed 
during geo-replication. Xsync seemed to handle the case better. The issue was 
when you bring the “Active” node down that is handling the geo-replication 
session, and it’s set to ChangeLog as the change method. Any files that are 
written into the cluster while geo-replication is down (eg, while the 
geo-replication session is being failed to another node), are missed / skipped, 
and won’t ever be transferred to the other cluster.

Is this the expected behavior? If not, then I can open a bug on it.

-CJ

From: Venky Shankar <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 16, 2014 at 4:43 PM

To: CJ Beck <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] Question about geo-replication and deletes in 3.5 
beta train


On Thu, Apr 17, 2014 at 3:01 AM, CJ Beck 
<[email protected]<mailto:[email protected]>> wrote:
I did have the “change_detector” set to xsync, which seems to be the issue 
(bypassing the changelog method). So I can fix that and see if the deletes are 
propagated.

Was that set intentionally? Setting this as the main change detection 
mechanism would crawl the filesystem every 60 seconds to replicate the changes. 
Changelog mode handles live changes, so any deletes that were performed before 
this option was set would not be propagated.


Also, is there a way to tell the geo-replication to go ahead and walk the 
filesystems to do a “sync” so the remote side files are deleted, if they are 
not on the source?

As of now, no. With distributed geo-replication, the geo-rep daemon crawls the 
bricks (instead of the mount). Since the brick would have a subset of the file 
system entities (for e.g. in a distributed volume), it's hard to find out 
purged entries without having to crawl the mount and comparing the entries b/w 
master and slave (which is slow). This is where changelog mode helps.


Thanks for the quick reply!

[root@host ~]# gluster volume geo-replication test-poc 10.10.1.120::test-poc 
status detail

MASTER NODE               MASTER VOL    MASTER BRICK     SLAVE                  
   STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING 
   BYTES PENDING    DELETES PENDING    FILES SKIPPED
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
host1.com<http://host1.com>    test-poc       /data/test-poc    
10.10.1.120::test-poc    Passive    N/A                  N/A             382    
        0                0                0                  0
host2.com<http://host2.com>    test-poc       /data/test-poc    
10.10.1.122::test-poc    Passive    N/A                  N/A             0      
        0                0                0                  0
host3.com<http://host3.com>    test-poc       /data/test-poc    
10.10.1.121::test-poc    Active     N/A                  Hybrid Crawl    10765  
        70               0                0                  0


From: Venky Shankar <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 16, 2014 at 1:54 PM
To: CJ Beck <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-users] Question about geo-replication and deletes in 3.5 
beta train

"ignore-deletes" is only valid in the initial crawl mode[1] where it does not 
propagate deletes to the slave (changelog mode does). Was the session restarted 
by any chance?

[1] Geo-replication now has two internal operations modes: a one shot 
filesystem crawl mode (used to replicate data already present in a volume) and 
the changelog mode (for replicating live changes).

Thanks,
-venky



On Thu, Apr 17, 2014 at 1:25 AM, CJ Beck 
<[email protected]<mailto:[email protected]>> wrote:
I have an issue where deletes are not being propagated to the slave cluster in 
a geo-replicated environment. I’ve looked through the code, and it appears as 
though this is something that might have been changed to be hard coded?

When I try to change it via a config option on the command line, it replies 
with a “reserved option” error:
[root@host ~]# gluster volume geo-replication test-poc 10.10.1.120::test-poc 
config ignore_deletes 1
Reserved option
geo-replication command failed
[root@host ~]# gluster volume geo-replication test-poc 10.10.1.120::test-poc 
config ignore-deletes 1
Reserved option
geo-replication command failed
[root@host ~]#

Looking at the source code (although, I’m not a C expert by any means), it 
seems as though it’s hard-coded to be “true” all the time?

(from glusterd-geo-rep.c):
4285         /* ignore-deletes */
4286         runinit_gsyncd_setrx (&runner, conf_path);
4287         runner_add_args (&runner, "ignore-deletes", "true", ".", ".", 
NULL);
4288         RUN_GSYNCD_CMD;

Any ideas how to get deletes propagated to the slave cluster?

Thanks!

-CJ

_______________________________________________
Gluster-users mailing list
[email protected]<mailto:[email protected]>
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Question about geo-replication and deletes in 3.5 beta train

Reply via email to