Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Ben Turner Wed, 20 Dec 2017 20:26:31 -0800

Here is the process for resolving split brain on replica 2:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html


It should be pretty much the same for replica 3, you change the xattrs with 
something like:

# setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 
/gfs/brick-b/a

When I try to decide which copy to use I normally run things like:

# stat /<path to brick>/pat/to/file

Check out the access and change times of the file on the back end bricks.  I 
normally pick the copy with the latest access / change times.  I'll also check:

# md5sum /<path to brick>/pat/to/file

Compare the hashes of the file on both bricks to see if the data actually 
differs.  If the data is the same it makes choosing the proper replica easier.

Any idea how you got in this situation?  Did you have a loss of NW 
connectivity?  I see you are using server side quorum, maybe check the logs for 
any loss of quorum?  I wonder if there was a loos of quorum and there was some 
sort of race condition hit:

http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls

"Unlike in client-quorum where the volume becomes read-only when quorum is 
lost, loss of server-quorum in a particular node makes glusterd kill the brick 
processes on that node (for the participating volumes) making even reads 
impossible."

I wonder if the killing of brick processes could have led to some sort of race 
condition where writes were serviced on one brick / the arbiter and not the 
other?

If you can find a reproducer for this please open a BZ with it, I have been 
seeing something similar(I think) but I haven't been able to run the issue down 
yet.

-b

----- Original Message -----
> From: "Henrik Juul Pedersen" <[email protected]>
> To: [email protected]
> Cc: "Henrik Juul Pedersen" <[email protected]>
> Sent: Wednesday, December 20, 2017 1:26:37 PM
> Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain.        
> gluster cli seems unaware
> 
> Hi,
> 
> I have the following volume:
> 
> Volume Name: virt_images
> Type: Replicate
> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> Status: Started
> Snapshot Count: 2
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: virt3:/data/virt_images/brick
> Brick2: virt2:/data/virt_images/brick
> Brick3: printserver:/data/virt_images/brick (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> features.barrier: disable
> features.scrub: Active
> features.bitrot: on
> nfs.rpc-auth-allow: on
> server.allow-insecure: on
> user.cifs: off
> features.shard: off
> cluster.shd-wait-qlength: 10000
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: enable
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> nfs.disable: on
> transport.address-family: inet
> server.outstanding-rpc-limit: 512
> 
> After a server reboot (brick 1) a single file has become unavailable:
> # touch fedora27.qcow2
> touch: setting times of 'fedora27.qcow2': Input/output error
> 
> Looking at the split brain status from the client side cli:
> # getfattr -n replica.split-brain-status fedora27.qcow2
> # file: fedora27.qcow2
> replica.split-brain-status="The file is not under data or metadata
> split-brain"
> 
> However, in the client side log, a split brain is mentioned:
> [2017-12-20 18:05:23.570762] E [MSGID: 108008]
> [afr-transaction.c:2629:afr_write_txn_refresh_done]
> 0-virt_images-replicate-0: Failing SETATTR on gfid
> 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed.
> [Input/output error]
> [2017-12-20 18:05:23.576046] W [MSGID: 108027]
> [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no
> read subvols for /fedora27.qcow2
> [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk]
> 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output
> error)
> 
> = Server side
> 
> No mention of a possible split brain:
> # gluster volume heal virt_images info split-brain
> Brick virt3:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> Brick virt2:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> Brick printserver:/data/virt_images/brick
> Status: Connected
> Number of entries in split-brain: 0
> 
> The info command shows the file:
> ]# gluster volume heal virt_images info
> Brick virt3:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> Brick virt2:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> Brick printserver:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
> 
> 
> The heal and heal full commands does nothing, and I can't find
> anything in the logs about them trying and failing to fix the file.
> 
> Trying to manually resolve the split brain from cli gives the following:
> # gluster volume heal virt_images split-brain source-brick
> virt3:/data/virt_images/brick /fedora27.qcow2
> Healing /fedora27.qcow2 failed: File not in split-brain.
> Volume heal failed.
> 
> The attrs from virt2 and virt3 are as follows:
> [root@virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> 
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> 
> I don't know how to find similar information from the arbiter...
> 
> Versions are the same on all three systems:
> # glusterd --version
> glusterfs 3.12.2
> 
> # gluster volume get all cluster.op-version
> Option                                  Value
> ------                                  -----
> cluster.op-version                      31202
> 
> I might try upgrading to version 3.13.0 tomorrow, but I want to hear
> you out first.
> 
> How do I fix this? Do I have to manually change the file attributes?
> 
> Also, in the guides for manual resolution through setfattr, all the
> bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in
> my system (as can be seen above), I only see the other bricks? So
> which attributes should be changes into what?
> 
> 
> 
> I hope someone might know a solution. If you need any more information
> I'll try and provide it. I can probably change the virtual machine to
> another image for now.
> 
> Best regards,
> Henrik Juul Pedersen
> LIAB ApS
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Reply via email to