Hi, I did a maintenance on the 2 bricks that we have. I added RAM. One of the brick was down for about 30 minutes and the other one for about 10 minutes. In between the shutdown, I only gave a few minutes to gluster to heal. I know that many files were still not in synch when I have shutdown the second brick.
The rest is some assumption. I know that one of the user was trying to share the zsh history file between multiple dockers. He tried to use the same file and also tried to use a directory to have multiple history files. My guess is that when I shutdown the first node, he created the directory. When I rebooted the first brick and shutdown the second one, I most likely have not give enough time to heal the 2 bricks. Then, he created the file on the second node. When I rebooted the second brick, Gluster was not able to recover. Would a third brick have solved this situation? I am not entirely sure. On Thu, Jun 15, 2017 at 1:43 AM, Mohammed Rafi K C <[email protected]> wrote: > Can you please explain How we ended up in this scenario. I think that will > help to understand more about this scenarios and why gluster recommend > replica 3 or arbiter volume. > > Regards > > Rafi KC > > On 06/15/2017 10:46 AM, Karthik Subrahmanya wrote: > > Hi Ludwig, > > There is no way to resolve gfid split-brains with type mismatch. You have > to do it manually by following the steps in [1]. > In case of type mismatch it is recommended to resolve it manually. But for > only gfid mismatch in 3.11 we have a way to > resolve it by using the *favorite-child-policy*. > Since the file is not important, you can go with deleting that. > > [1] https://gluster.readthedocs.io/en/latest/Troubleshooting/ > split-brain/#fixing-directory-entry-split-brain > > HTH, > Karthik > > On Thu, Jun 15, 2017 at 8:23 AM, Ludwig Gamache <[email protected]> > wrote: > >> I am new to gluster but already like it. I did a maintenance last week >> where I shutdown both nodes (one after each others). I had many files that >> needed to be healed after that. Everything worked well, except for 1 file. >> It is in split-brain, with 2 different GFID. I read the documentation but >> it only covers the cases where the GFID is the same on both bricks. BTW, I >> am running Gluster 3.10. >> >> Here are some details... >> >> [root@NAS-01 .glusterfs]# gluster volume heal data01 info >> >> Brick 192.168.186.11:/mnt/DATA/data >> >> /abc/.zsh_history >> >> /abc - Is in split-brain >> >> >> Status: Connected >> >> Number of entries: 2 >> >> >> Brick 192.168.186.12:/mnt/DATA/data >> >> /abc - Is in split-brain >> >> >> /abc/.zsh_history >> >> Status: Connected >> >> Number of entries: 2 >> >> On brick 1: >> >> [root@NAS-01 abc]# ls -lart >> >> total 75 >> >> drwxr-xr-x. 2 root root 2 Jun 8 13:26 .zsh_history >> >> drwxr-xr-x. 3 12078 root 3 Jun 12 11:36 . >> >> drwxrwxrwt. 17 root root 17 Jun 12 12:20 .. >> >> On brick 2: >> >> [root@DC-MTL-NAS-02 abc]# ls -lart >> >> total 66 >> >> -rw-rw-r--. 2 12078 12078 1085 Jun 12 04:42 .zsh_history >> >> drwxr-xr-x. 2 12078 root 3 Jun 12 10:36 . >> >> drwxrwxrwt. 17 root root 17 Jun 12 11:20 .. >> >> Notice that on one brick, it is a file and on the other one it is a >> directory. >> >> On brick 1: >> >> [root@NAS-01 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_histor >> y >> >> getfattr: Removing leading '/' from absolute path names >> >> # file: mnt/DATA/data/abc/.zsh_history >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >> c6162656c65645f743a733000 >> >> trusted.afr.data01-client-0=0x000000000000000000000000 >> >> trusted.afr.data01-client-1=0x000000000000000200000000 >> >> trusted.gfid=0xdee43407139d41f091d13e106a51f262 >> >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> >> On brick 2: >> >> root@NAS-02 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_histor >> y >> >> getfattr: Removing leading '/' from absolute path names >> >> # file: mnt/DATA/data/abc/.zsh_history >> >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >> c6162656c65645f743a733000 >> >> trusted.afr.data01-client-0=0x000000170000000200000000 >> >> trusted.afr.data01-client-1=0x000000000000000000000000 >> >> trusted.bit-rot.version=0x060000000000000059397acd0005dadd >> >> trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803 >> >> Any recommendation on how to recover from that? BTW, the file is not >> important and I could easily get rid of it without impact. So, if this is >> an easy solution... >> >> Regards, >> >> -- >> Ludwig Gamache >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > _______________________________________________ > Gluster-users mailing > [email protected]http://lists.gluster.org/mailman/listinfo/gluster-users > > > -- Ludwig Gamache IT Director - Element AI 4200 St-Laurent, suite 1200 514-704-0564
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
