I was able to run another set of tests this week and I was able to reproduce the issue again. Going by the extended attributes, I think i ran into the same issue I saw earlier..
Do you think i need to open up a bug report? Brick 1: trusted.afr.PL2-client-0=0x000000000000000000000000 trusted.afr.PL2-client-1=0x000000010000000000000000 trusted.afr.PL2-client-2=0x000000010000000000000000 trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c Brick 2 trusted.afr.PL2-client-0=0x0000125c0000000000000000 trusted.afr.PL2-client-1=0x000000000000000000000000 trusted.afr.PL2-client-2=0x000000000000000000000000 trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c Brick 3 trusted.afr.PL2-client-0=0x0000125c0000000000000000 trusted.afr.PL2-client-1=0x000000000000000000000000 trusted.afr.PL2-client-2=0x000000000000000000000000 trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c [root@ip-172-31-12-218 ~]# gluster volume info Volume Name: PL1 Type: Replicate Volume ID: bd351bae-d467-4e8c-bbd2-6a0fe99c346a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.31.38.189:/data/vol1/gluster-data Brick2: 172.31.16.220:/data/vol1/gluster-data Brick3: 172.31.12.218:/data/vol1/gluster-data Options Reconfigured: cluster.server-quorum-type: server network.ping-timeout: 12 nfs.addr-namelookup: off performance.cache-size: 2147483648 cluster.quorum-type: auto performance.read-ahead: off performance.client-io-threads: on performance.io-thread-count: 64 cluster.eager-lock: on cluster.server-quorum-ratio: 51% Volume Name: PL2 Type: Replicate Volume ID: e6ad8787-05d8-474b-bc78-748f8c13700f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.31.38.189:/data/vol2/gluster-data Brick2: 172.31.16.220:/data/vol2/gluster-data Brick3: 172.31.12.218:/data/vol2/gluster-data Options Reconfigured: nfs.addr-namelookup: off cluster.server-quorum-type: server network.ping-timeout: 12 performance.cache-size: 2147483648 cluster.quorum-type: auto performance.read-ahead: off performance.client-io-threads: on performance.io-thread-count: 64 cluster.eager-lock: on cluster.server-quorum-ratio: 51% [root@ip-172-31-12-218 ~]# *Mount command* Client mount -t glusterfs -o defaults,enable-ino32,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log,backupvolfile-server=172.31.38.189,backupvolfile-server=172.31.12.218,background-qlen=256 172.31.16.220:/PL2 /mnt/vm Server /dev/xvdf /data/vol1 xfs defaults,inode64,noatime 1 2 /dev/xvdg /data/vol2 xfs defaults,inode64,noatime 1 2 *Packages* Client rpm -qa | grep gluster glusterfs-fuse-3.5.2-1.el6.x86_64 glusterfs-3.5.2-1.el6.x86_64 glusterfs-libs-3.5.2-1.el6.x86_64 Server [root@ip-172-31-12-218 ~]# rpm -qa | grep gluster glusterfs-3.5.2-1.el6.x86_64 glusterfs-fuse-3.5.2-1.el6.x86_64 glusterfs-api-3.5.2-1.el6.x86_64 glusterfs-server-3.5.2-1.el6.x86_64 glusterfs-libs-3.5.2-1.el6.x86_64 glusterfs-cli-3.5.2-1.el6.x86_64 [root@ip-172-31-12-218 ~]# On Sat, Sep 6, 2014 at 9:01 AM, Pranith Kumar Karampuri <[email protected] > wrote: > > On 09/06/2014 04:53 AM, Jeff Darcy wrote: > >> I have a replicate glusterfs setup on 3 Bricks ( replicate = 3 ). I have >>> client and server quorum turned on. I rebooted one of the 3 bricks. When >>> it >>> came back up, the client started throwing error messages that one of the >>> files went into split brain. >>> >> This is a good example of how split brain can happen even with all kinds >> of >> quorum enabled. Let's look at those xattrs. BTW, thank you for a very >> nicely detailed bug report which includes those. >> >> BRICK1 >>> ======== >>> [root@ip-172-31-38-189 ~]# getfattr -d -m . -e hex >>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17 >>> _00_00 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00 >>> trusted.afr.PL2-client-0=0x000000000000000000000000 >>> trusted.afr.PL2-client-1=0x000000010000000000000000 >>> trusted.afr.PL2-client-2=0x000000010000000000000000 >>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2 >>> >>> BRICK 2 >>> ======= >>> [root@ip-172-31-16-220 ~]# getfattr -d -m . -e hex >>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17 >>> _00_00 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00 >>> trusted.afr.PL2-client-0=0x00000d460000000000000000 >>> trusted.afr.PL2-client-1=0x000000000000000000000000 >>> trusted.afr.PL2-client-2=0x000000000000000000000000 >>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2 >>> BRICK 3 >>> ========= >>> [root@ip-172-31-12-218 ~]# getfattr -d -m . -e hex >>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17 >>> _00_00 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00 >>> trusted.afr.PL2-client-0=0x00000d460000000000000000 >>> trusted.afr.PL2-client-1=0x000000000000000000000000 >>> trusted.afr.PL2-client-2=0x000000000000000000000000 >>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2 >>> >> Here, we see that brick 1 shows a single pending operation for the other >> two, while they show 0xd46 (3398) pending operations for brick 1. >> Here's how this can happen. >> >> (1) There is exactly one pending operation. >> >> (2) Brick1 completes the write first, and says so. >> >> (3) Client sends messages to all three, saying to decrement brick1's >> count. >> >> (4) All three bricks receive and process that message. >> >> (5) Brick1 fails. >> >> (6) Brick2 and brick3 complete the write, and say so. >> >> (7) Client tells all bricks to decrement remaining counts. >> >> (8) Brick2 and brick3 receive and process that message. >> >> (9) Brick1 is dead, so its counts for brick2/3 stay at one. >> >> (10) Brick2 and brick3 have quorum, with all-zero pending counters. >> >> (11) Client sends 0xd46 more writes to brick2 and brick3. >> >> Note that at no point did we lose quorum. Note also the tight timing >> required. If brick1 had failed an instant earlier, it would not have >> decremented its own counter. If it had failed an instant later, it >> would have decremented brick2's and brick3's as well. If brick1 had not >> finished first, we'd be in yet another scenario. If delayed changelog >> had been operative, the messages at (3) and (7) would have been combined >> to leave us in yet another scenario. As far as I can tell, we would >> have been able to resolve the conflict in all those cases. >> *** Key point: quorum enforcement does not totally eliminate split >> brain. It only makes the frequency a few orders of magnitude lower. *** >> > > Not quite right. After we fixed the bug https://bugzilla.redhat.com/ > show_bug.cgi?id=1066996, the only two possible ways to introduce > split-brain are > 1) if we have an implementation bug in changelog xattr marking, I believe > that to be the case here. > 2) Keep writing to the file from the mount then > a) take brick 1 down, wait until at least one write is successful > b) bring brick1 back up and take brick 2 down (self-heal should not > happen) wait until at least one write is successful > c) bring brick2 back up and take brick 3 down (self-heal should not > happen) wait until at least one write is successful > > With outcast implementation case-2 will also be immune to split-brain > errors. > > Then the only way we have split-brains in afr is implementation errors of > changelog marking. If we test it thoroughly and fix such problems we can > get it to be immune to split-brain :-). > > Pranith > >> So, is there any way to prevent this completely? Some AFR enhancements, >> such as the oft-promised "outcast" feature[1], might have helped. >> NSR[2] is immune to this particular problem. "Policy based split brain >> resolution"[3] might have resolved it automatically instead of merely >> flagging it. Unfortunately, those are all in the future. For now, I'd >> say the best approach is to resolve the conflict manually and try to >> move on. Unless there's more going on than meets the eye, recurrence >> should be very unlikely. >> >> [1] http://www.gluster.org/community/documentation/index. >> php/Features/outcast >> >> [2] http://www.gluster.org/community/documentation/index. >> php/Features/new-style-replication >> >> [3] http://www.gluster.org/community/documentation/index. >> php/Features/pbspbr >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
