[Gluster-users] Geo-replication big logs and large number of pending files

wodel youchi Tue, 24 Feb 2015 06:38:04 -0800

Hi,
I have a 03 nodes setup (Centos7x64 latest updates, glusterfs 3.6.1 latest 
updates).
Master: 02 nodes  (g1 and g2) are on replicated mode with 02 volumes data1 and 
data2, each volume is constituted of one brick.Slave: the 3rd node (g3) is for 
Geo-Rep with also 2 volumes slavedata1 and slavedata2I am using the geo-rep 
with a user geoaccount1 and group geogroup1.
the setup was successfully made and geo-rep started.


Problems:- After some days, I've found the geo-rep in a faulty state, the 
reason /var was full in g1 and g3 the slave node.the ssh log file for 
geo-replication-slave on g3 was full with (11Go):
[2015-02-24 11:29:26.526285] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 
0-slavedata2-client-0: remote operation failed: File exists. Path: 
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S15bind9 to ../init.d/bind9)
[2015-02-24 11:29:26.526297] W [fuse-bridge.c:1261:fuse_err_cbk] 
0-glusterfs-fuse: 1100: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d 
=> -1 (File exists)
[2015-02-24 11:29:26.526602] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 
0-slavedata2-client-0: remote operation failed: File exists. Path: 
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S20modules_dep.sh to 
../init.d/modules_dep.sh)
[2015-02-24 11:29:26.526618] W [fuse-bridge.c:1261:fuse_err_cbk] 
0-glusterfs-fuse: 1101: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d 
=> -1 (File exists)
I emptied the log files on both servers, then I modified the logrotate conf 
file for geo-repl on all nodes from rotate 52 to 
rotate 7
size 50M
Does geo-rep produce such big logs?

the modifications worked for g1 and g2, but I had a problem with 
g3[root@glustersrv3 logrotate.d]# logrotate -f /etc/logrotate.d/glusterfs-georep
error: skipping 
"/var/log/glusterfs/geo-replication-slaves/967ddac3-af34-4c70-8d2b-eb201ebb645d:gluster%3A%2F%2F127.0.0.1%3Aslavedata1.gluster.log"
 because parent directory has insecure permissions (It's world writable or 
writable by group which is not "root") Set "su" directive in config file to 
tell logrotate which user/group should be used for rotation
So I added these two lines to the /etc/logrotate.d/glusterfs-georepsu root 
geogroup1
 And now it seems working, is that correct?
After cleaning up the logs, I've tried to restart the geo-rep but didn't 
succeed: no active session between g1 and g3 erro, so I had to restart the 
glusterfs daemon on all three nodes.
After the geo-rep was restarted and the its state became stable, I did a 
geo-rep status detail and I got this
[root@glustersrv1 ~]# gluster volume geo-replication data1  
[email protected]::slavedata1 status detail

MASTER NODE            MASTER VOL    MASTER BRICK         SLAVE                 
          STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES 
PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
glustersrv1.domain.tld    data1         /mnt/brick1/brick    
gserver3.domain.tld::slavedata1    Active     N/A                  Hybrid Crawl 
   25784          8191             0                0                  0
glustersrv2.domain.tld    data1         /mnt/brick1/brick    
gserver3.domain.tld::slavedata1    Passive    N/A                  N/A          
   0              0                0                0                  0
[root@glustersrv1 ~]# gluster volume geo-replication data2  
[email protected]::slavedata2 status detail

MASTER NODE            MASTER VOL    MASTER BRICK         SLAVE                 
          STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES 
PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
glustersrv1.domain.tld    data2         /mnt/brick2/brick    
gserver3.domain.tld::slavedata2    Active     N/A                  Hybrid Crawl 
   11768408       8191             0                0                  3833
glustersrv2.domain.tld    data2         /mnt/brick2/brick    
gserver3.domain.tld::slavedata2    Passive    N/A                  N/A          
   0              0                0                0                  0
What does it mean  FILES PENDING? because this number didn't change after 1hour 
from restarting the geo-rep, I thought that it will decrease over time but it 
didn't.And what does mean FILES SKIPPED?
I tried another thing, I stopped the geo-rep, stopped the volumes on g3 then 
deleted them.then I cleaned up the .glusterfs directory on both bricks and 
deleted all the glusterfs attributes on them with setfattr command, but I did 
not delete my data (files and directories).
then I recreated the slave volumes, started them and finally restarted the 
geo-rep, after the initialization and stabilization I got the same result from 
status command on geo-rep, the same values on FILES PENDING and FILES SKIPPED
is that ok? how can I be sure that I have all my data on g3?

thanks in advance

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Geo-replication big logs and large number of pending files

Reply via email to