On 09/29/2016 05:18 PM, Sahina Bose wrote:
Yes, this is a GlusterFS problem. Adding gluster users ML

On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari <[email protected] <mailto:[email protected]>> wrote:

    Hello

    maybe this is more glustefs then ovirt related but since OVirt
    integrates Gluster management and I'm experiencing the problem in
    an ovirt cluster, I'm writing here.

    The problem is simple: I have a data domain mappend on a replica 3
    arbiter1 Gluster volume with 6 bricks, like this:

    Status of volume: data_ssd
    Gluster process TCP Port  RDMA Port  Online  Pid
    
------------------------------------------------------------------------------
    Brick vm01.storage.billy:/gluster/ssd/data/
    brick 49153     0          Y       19298
    Brick vm02.storage.billy:/gluster/ssd/data/
    brick 49153     0          Y       6146
    Brick vm03.storage.billy:/gluster/ssd/data/
    arbiter_brick 49153     0          Y       6552
    Brick vm03.storage.billy:/gluster/ssd/data/
    brick 49154     0          Y       6559
    Brick vm04.storage.billy:/gluster/ssd/data/
    brick 49152     0          Y       6077
    Brick vm02.storage.billy:/gluster/ssd/data/
    arbiter_brick 49154     0          Y       6153
    Self-heal Daemon on localhost N/A       N/A        Y       30746
Self-heal Daemon on vm01.storage.billy N/A N/A Y 196058 Self-heal Daemon on vm03.storage.billy N/A N/A Y 23205 Self-heal Daemon on vm04.storage.billy N/A N/A Y 8246


    Now, I've put in maintenance the vm04 host, from ovirt, ticking
    the "Stop gluster" checkbox, and Ovirt didn't complain about
    anything. But when I tried to run a new VM it complained about
    "storage I/O problem", while the storage data status was always UP.

    Looking in the gluster logs I can see this:

    [2016-09-29 11:01:01.556908] I
    [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change
    in volfile, continuing
    [2016-09-29 11:02:28.124151] E [MSGID: 108008]
    [afr-read-txn.c:89:afr_read_txn_refresh_done]
    0-data_ssd-replicate-1: Failing READ on gfid
    bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
    [Input/output error]
    [2016-09-29 11:02:28.126580] W [MSGID: 108008]
    [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
    Unreadable subvolume -1 found with event generation 6 for gfid
    bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
    [2016-09-29 11:02:28.127374] E [MSGID: 108008]
    [afr-read-txn.c:89:afr_read_txn_refresh_done]
    0-data_ssd-replicate-1: Failing FGETXATTR on gfid
    bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
    [Input/output error]
    [2016-09-29 11:02:28.128130] W [MSGID: 108027]
    [afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no
    read subvols for (null)
    [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
    0-glusterfs-fuse: 8201: READ => -1
    gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210
    (Input/output error)
    [2016-09-29 11:02:28.130824] E [MSGID: 108008]
    [afr-read-txn.c:89:afr_read_txn_refresh_done]
    0-data_ssd-replicate-1: Failing FSTAT on gfid
    bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
    [Input/output error]


Does `gluster volume heal data_ssd info split-brain` report that the file is in split-brain, with vm04 still being down? If yes, could you provide the extended attributes of this gfid from all 3 bricks: getfattr -d -m . -e hex /path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d

If no, then I'm guessing that it is not in actual split-brain (hence the 'Possible split-brain' message). If the node you brought down contains the only good copy of the file (i.e the other data brick and arbiter are up, and the arbiter 'blames' this other brick), all I/O is failed with EIO to prevent file from getting into actual split-brain. The heals will happen when the good node comes up and I/O should be allowed again in that case.

-Ravi


    [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
    0-glusterfs-fuse: 8202: FSTAT()
    
/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
    => -1 (Input/output error)
    The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
    0-data_ssd-replicate-1: Unreadable subvolume -1 found with event
    generation 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d.
    (Possible split-brain)" repeated 11 times between [2016-09-29
    11:02:28.126580] and [2016-09-29 11:02:28.517744]
    [2016-09-29 11:02:28.518607] E [MSGID: 108008]
    [afr-read-txn.c:89:afr_read_txn_refresh_done]
    0-data_ssd-replicate-1: Failing STAT on gfid
    bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
    [Input/output error]

    Now, how is it possible to have a split brain if I stopped just
    ONE server which had just ONE of six bricks, and it was cleanly
    shut down with maintenance mode from ovirt?

    I created the volume originally this way:
    # gluster volume create data_ssd replica 3 arbiter 1
    vm01.storage.billy:/gluster/ssd/data/brick
    vm02.storage.billy:/gluster/ssd/data/brick
    vm03.storage.billy:/gluster/ssd/data/arbiter_brick
    vm03.storage.billy:/gluster/ssd/data/brick
    vm04.storage.billy:/gluster/ssd/data/brick
    vm02.storage.billy:/gluster/ssd/data/arbiter_brick
    # gluster volume set data_ssd group virt
    # gluster volume set data_ssd storage.owner-uid 36 && gluster
    volume set data_ssd storage.owner-gid 36
    # gluster volume start data_ssd







-- Davide Ferrari
    Senior Systems Engineer

    _______________________________________________
    Users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ovirt.org/mailman/listinfo/users
    <http://lists.ovirt.org/mailman/listinfo/users>



_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to