On 09/29/2016 08:03 PM, Davide Ferrari wrote:
It's strange, I've tried to trigger the error again by putting vm04 in maintenence and stopping the gluster service (from ovirt gui) and now the VM starts correctly. Maybe the arbiter indeed blamed the brick that was still up before, but how's that possible?

A write from the client on that file (vm image) could have succeeded only on vm04 even before you brought it down.

The only (maybe big) difference with the previous, erroneous situation, is that before I did maintenence (+ reboot) of 3 of my 4 hosts, maybe I should have left more time between one reboot and another?

If you did not do anything from the previous run other than to bring the node up and things worked, then the file is not in split-brain. Split braine'd files need to be resolved before they can be accessed again, which apparently did not happen in your case.

-Ravi

2016-09-29 14:16 GMT+02:00 Ravishankar N <[email protected] <mailto:[email protected]>>:

    On 09/29/2016 05:18 PM, Sahina Bose wrote:
    Yes, this is a GlusterFS problem. Adding gluster users ML

    On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari
    <[email protected] <mailto:[email protected]>> wrote:

        Hello

        maybe this is more glustefs then ovirt related but since
        OVirt integrates Gluster management and I'm experiencing the
        problem in an ovirt cluster, I'm writing here.

        The problem is simple: I have a data domain mappend on a
        replica 3 arbiter1 Gluster volume with 6 bricks, like this:

        Status of volume: data_ssd
        Gluster process TCP Port  RDMA Port  Online  Pid
        
------------------------------------------------------------------------------
        Brick vm01.storage.billy:/gluster/ssd/data/
        brick 49153     0          Y       19298
        Brick vm02.storage.billy:/gluster/ssd/data/
        brick 49153     0          Y       6146
        Brick vm03.storage.billy:/gluster/ssd/data/
        arbiter_brick 49153     0          Y       6552
        Brick vm03.storage.billy:/gluster/ssd/data/
        brick 49154     0          Y       6559
        Brick vm04.storage.billy:/gluster/ssd/data/
        brick 49152     0          Y       6077
        Brick vm02.storage.billy:/gluster/ssd/data/
        arbiter_brick 49154     0          Y       6153
Self-heal Daemon on localhost N/A N/A Y 30746 Self-heal Daemon on vm01.storage.billy N/A N/A Y 196058 Self-heal Daemon on vm03.storage.billy N/A N/A Y 23205 Self-heal Daemon on vm04.storage.billy N/A N/A Y 8246


        Now, I've put in maintenance the vm04 host, from ovirt,
        ticking the "Stop gluster" checkbox, and Ovirt didn't
        complain about anything. But when I tried to run a new VM it
        complained about "storage I/O problem", while the storage
        data status was always UP.

        Looking in the gluster logs I can see this:

        [2016-09-29 11:01:01.556908] I
        [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No
        change in volfile, continuing
        [2016-09-29 11:02:28.124151] E [MSGID: 108008]
        [afr-read-txn.c:89:afr_read_txn_refresh_done]
        0-data_ssd-replicate-1: Failing READ on gfid
        bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
        [Input/output error]
        [2016-09-29 11:02:28.126580] W [MSGID: 108008]
        [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
        Unreadable subvolume -1 found with event generation 6 for
        gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
        [2016-09-29 11:02:28.127374] E [MSGID: 108008]
        [afr-read-txn.c:89:afr_read_txn_refresh_done]
        0-data_ssd-replicate-1: Failing FGETXATTR on gfid
        bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
        [Input/output error]
        [2016-09-29 11:02:28.128130] W [MSGID: 108027]
        [afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1:
        no read subvols for (null)
        [2016-09-29 11:02:28.129890] W
        [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 8201:
        READ => -1 gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
        fd=0x7f09b749d210 (Input/output error)
        [2016-09-29 11:02:28.130824] E [MSGID: 108008]
        [afr-read-txn.c:89:afr_read_txn_refresh_done]
        0-data_ssd-replicate-1: Failing FSTAT on gfid
        bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
        [Input/output error]


    Does `gluster volume heal data_ssd info split-brain` report that
    the file is in split-brain, with vm04 still being down?
    If yes, could you provide the extended attributes of this gfid
    from all 3 bricks:
    getfattr -d -m . -e hex
    /path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d

    If no, then I'm guessing that it is not in actual split-brain
    (hence the 'Possible split-brain' message). If the node you
    brought down contains the only good copy of the file (i.e the
    other data brick and arbiter are up, and the arbiter 'blames' this
    other brick), all I/O is failed with EIO to prevent file from
    getting into actual split-brain. The heals will happen when the
    good node comes up and I/O should be allowed again in that case.

    -Ravi


        [2016-09-29 11:02:28.133879] W
        [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 8202:
        FSTAT()
        
/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
        => -1 (Input/output error)
        The message "W [MSGID: 108008]
        [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
        Unreadable subvolume -1 found with event generation 6 for
        gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible
        split-brain)" repeated 11 times between [2016-09-29
        11:02:28.126580] and [2016-09-29 11:02:28.517744]
        [2016-09-29 11:02:28.518607] E [MSGID: 108008]
        [afr-read-txn.c:89:afr_read_txn_refresh_done]
        0-data_ssd-replicate-1: Failing STAT on gfid
        bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
        [Input/output error]

        Now, how is it possible to have a split brain if I stopped
        just ONE server which had just ONE of six bricks, and it was
        cleanly shut down with maintenance mode from ovirt?

        I created the volume originally this way:
        # gluster volume create data_ssd replica 3 arbiter 1
        vm01.storage.billy:/gluster/ssd/data/brick
        vm02.storage.billy:/gluster/ssd/data/brick
        vm03.storage.billy:/gluster/ssd/data/arbiter_brick
        vm03.storage.billy:/gluster/ssd/data/brick
        vm04.storage.billy:/gluster/ssd/data/brick
        vm02.storage.billy:/gluster/ssd/data/arbiter_brick
        # gluster volume set data_ssd group virt
        # gluster volume set data_ssd storage.owner-uid 36 && gluster
        volume set data_ssd storage.owner-gid 36
        # gluster volume start data_ssd







-- Davide Ferrari
        Senior Systems Engineer

        _______________________________________________
        Users mailing list
        [email protected] <mailto:[email protected]>
        http://lists.ovirt.org/mailman/listinfo/users
        <http://lists.ovirt.org/mailman/listinfo/users>






--
Davide Ferrari
Senior Systems Engineer


_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to