Re: [Gluster-users] Replicated striped data lose

Mahdi Adnan Tue, 15 Mar 2016 00:37:57 -0700

Thanks Krutika,

I have deleted the volume and created a new one.

I found that it may be an issue with the NFS itself, i have created anew striped volume and enabled sharding and mounted it via glusterfs andit worked just fine, if i mount it with nfs it will fail and gives methe same errors.


Respectfully*
**Mahdi A. Mahdi*

On 03/15/2016 06:24 AM, Krutika Dhananjay wrote:

Hi,

So could you share the xattrs associated with the file at<BRICK_PATH>/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c


Here's what you need to execute:

# getfattr -d -m . -e hex/mnt/b1/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c on thefirst node and

# getfattr -d -m . -e hex/mnt/b2/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c on thesecond.

Also, it is normally advised to use a replica 3 volume as opposed toreplica 2 volume to guard against split-brains.


-Krutika

On Mon, Mar 14, 2016 at 3:17 PM, Mahdi Adnan<[email protected] <mailto:[email protected]>>wrote:


    sorry for serial posting but, i got new logs it might help..

    the message appear during the migration;

    /var/log/glusterfs/nfs.log


    [2016-03-14 09:45:04.573765] I [MSGID: 109036]
    [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal]
    0-testv-dht: Setting layout of /New Virtual Machine_1 with
    [Subvol_name: testv-stripe-0, Err: -1 , Start: 0 , Stop:
    4294967295 , Hash: 1 ],
    [2016-03-14 09:45:04.957499] E
    [shard.c:369:shard_modify_size_and_block_count]
    
(-->/usr/lib64/glusterfs/3.7.8/xlator/cluster/distribute.so(dht_file_setattr_cbk+0x14f)
    [0x7f27a13c067f]
    
-->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_common_setattr_cbk+0xcc)
    [0x7f27a116681c]
    
-->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_modify_size_and_block_count+0xdd)
    [0x7f27a116584d] ) 0-testv-shard: Failed to get
    trusted.glusterfs.shard.file-size for
    c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
    [2016-03-14 09:45:04.957577] W [MSGID: 112199]
    [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: /New
    Virtual Machine_1/New Virtual Machine-flat.vmdk => (XID: 3fec5a26,
    SETATTR: NFS: 22(Invalid argument for operation), POSIX:
    22(Invalid argument)) [Invalid argument]
    [2016-03-14 09:45:05.079657] E [MSGID: 112069]
    [nfs3.c:3649:nfs3_rmdir_resume] 0-nfs-nfsv3: No such file or
    directory: (192.168.221.52:826 <http://192.168.221.52:826>) testv
    : 00000000-0000-0000-0000-000000000001



    Respectfully*
    **Mahdi A. Mahd

    *
    On 03/14/2016 11:14 AM, Mahdi Adnan wrote:

    So i have deployed a new server "Cisco UCS C220M4" and created a
    new volume;

    Volume Name: testv
    Type: Stripe
    Volume ID: 55cdac79-fe87-4f1f-90c0-15c9100fe00b
    Status: Started
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: 10.70.0.250:/mnt/b1/v
    Brick2: 10.70.0.250:/mnt/b2/v
    Options Reconfigured:
    nfs.disable: off
    features.shard-block-size: 64MB
    features.shard: enable
    cluster.server-quorum-type: server
    cluster.quorum-type: auto
    network.remote-dio: enable
    cluster.eager-lock: enable
    performance.stat-prefetch: off
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    performance.readdir-ahead: off

    same error ..

    can anyone share with me the info of a working striped volume ?

    On 03/14/2016 09:02 AM, Mahdi Adnan wrote:

    I have a pool of two bricks in the same server;

    Volume Name: k
    Type: Stripe
    Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
    Status: Started
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: gfs001:/bricks/t1/k
    Brick2: gfs001:/bricks/t2/k
    Options Reconfigured:
    features.shard-block-size: 64MB
    features.shard: on
    cluster.server-quorum-type: server
    cluster.quorum-type: auto
    network.remote-dio: enable
    cluster.eager-lock: enable
    performance.stat-prefetch: off
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    performance.readdir-ahead: off

    same issue ...
    glusterfs 3.7.8 built on Mar 10 2016 20:20:45.


    Respectfully*
    **Mahdi A. Mahdi*

    Systems Administrator
    IT. Department
    Earthlink Telecommunications
    <https://www.facebook.com/earthlinktele>

    Cell: 07903316180
    Work: 3352
    Skype: [email protected] <mailto:[email protected]>
    On 03/14/2016 08:11 AM, Niels de Vos wrote:

    On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay wrote:

    It would be better to use sharding over stripe for your vm use case. It
    offers better distribution and utilisation of bricks and better heal
    performance.
    And it is well tested.

    Basically the "striping" feature is deprecated, "sharding" is its
    improved replacement. I expect to see "striping" completely dropped in
    the next major release.

    Niels

    Couple of things to note before you do that:
    1. Most of the bug fixes in sharding have gone into 3.7.8. So it is advised
    that you use 3.7.8 or above.
    2. When you enable sharding on a volume, already existing files in the
    volume do not get sharded. Only the files that are newly created from the
    time sharding is enabled will.
         If you do want to shard the existing files, then you would need to cp
    them to a temp name within the volume, and then rename them back to the
    original file name.

    HTH,
    Krutika

    On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan <[email protected]
    <mailto:[email protected]>

    wrote:
    I couldn't find anything related to cache in the HBAs.
    what logs are useful in my case ? i see only bricks logs which contains
    nothing during the failure.

    ###
    [2016-03-13 18:05:19.728614] E [MSGID: 113022] [posix.c:1232:posix_mknod]
    0-vmware-posix: mknod on
    /bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed
    [File exists]
    [2016-03-13 18:07:23.337086] E [MSGID: 113022] [posix.c:1232:posix_mknod]
    0-vmware-posix: mknod on
    /bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed
    [File exists]
    [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 0-vmware-trash:
    rmdir issued on /.trashcan/, which is not permitted
    [2016-03-13 18:07:55.027635] I [MSGID: 115056]
    [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR
    /.trashcan/internal_op (00000000-0000-0000-0000-000000000005/internal_op)
    ==> (Operation not permitted) [Operation not permitted]
    [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: allowed
    user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
    [2016-03-13 18:11:34.353463] I [MSGID: 115029]
    [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
    from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 (version:
    3.7.8)
    [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: allowed
    user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
    [2016-03-13 18:11:34.591173] I [MSGID: 115029]
    [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
    from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 (version:
    3.7.8)
    ###

    ESXi just keeps telling me "Cannot clone T: The virtual disk is either
    corrupted or not a supported format.
    error
    3/13/2016 9:06:20 PM
    Clone virtual machine
    T
    VCENTER.LOCAL\Administrator
    "

    My setup is 2 servers with a floating ip controlled by CTDB and my ESXi
    server mount the NFS via the floating ip.





    On 03/13/2016 08:40 PM, pkoelle wrote:

    Am 13.03.2016 um 18:22 schrieb David Gossage:

    On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <
    [email protected]
    <mailto:[email protected]>

    wrote:

    My HBAs are LSISAS1068E, and the filesystem is XFS.

    I tried EXT4 and it did not help.
    I have created a stripted volume in one server with two bricks, same
    issue.
    and i tried a replicated volume with just "sharding enabled" same issue,
    as soon as i disable the sharding it works just fine, niether sharding
    nor
    striping works for me.
    i did follow up with some of threads in the mailing list and tried some
    of
    the fixes that worked with the others, none worked for me. :(

    Is it possible the LSI has write-cache enabled?

    Why is that relevant? Even the backing filesystem has no idea if there is
    a RAID or write cache or whatever. There are blocks and sync(), end of
    story.
    If you lose power and screw up your recovery OR do funky stuff with SAS
    multipathing that might be an issue with a controller cache. AFAIK thats
    not what we are talking about.

    I'm afraid but unless the OP has some logs from the server, a
    reproducible testcase or a backtrace from client or server this isn't
    getting us anywhere.

    cheers
    Paul

    On 03/13/2016 06:54 PM, David Gossage wrote:

    On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
    [email protected]
    <mailto:[email protected]>> wrote:

    Okay so i have enabled shard in my test volume and it did not help,

    stupidly enough, i have enabled it in a production volume
    "Distributed-Replicate" and it currpted  half of my VMs.
    I have updated Gluster to the latest and nothing seems to be changed in
    my situation.
    below the info of my volume;

    I was pointing at the settings in that email as an example for
    corruption
    fixing. I wouldn't recommend enabling sharding if you haven't gotten the
    base working yet on that cluster. What HBA's are you using and what is
    layout of filesystem for bricks?


    Number of Bricks: 3 x 2 = 6

    Transport-type: tcp
    Bricks:
    Brick1: gfs001:/bricks/b001/vmware
    Brick2: gfs002:/bricks/b004/vmware
    Brick3: gfs001:/bricks/b002/vmware
    Brick4: gfs002:/bricks/b005/vmware
    Brick5: gfs001:/bricks/b003/vmware
    Brick6: gfs002:/bricks/b006/vmware
    Options Reconfigured:
    performance.strict-write-ordering: on
    cluster.server-quorum-type: server
    cluster.quorum-type: auto
    network.remote-dio: enable
    performance.stat-prefetch: disable
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    cluster.eager-lock: enable
    features.shard-block-size: 16MB
    features.shard: on
    performance.readdir-ahead: off


    On 03/12/2016 08:11 PM, David Gossage wrote:


    On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
    <[email protected]>
    <mailto:[email protected]>[email protected]
    <mailto:[email protected]>> wrote:

    Both servers have HBA no RAIDs and i can setup a replicated or

    dispensers without any issues.
    Logs are clean and when i tried to migrate a vm and got the error,
    nothing showed up in the logs.
    i tried mounting the volume into my laptop and it mounted fine but,
    if i
    use dd to create a data file it just hang and i cant cancel it, and i
    cant
    unmount it or anything, i just have to reboot.
    The same servers have another volume on other bricks in a distributed
    replicas, works fine.
    I have even tried the same setup in a virtual environment (created two
    vms and install gluster and created a replicated striped) and again
    same
    thing, data corruption.

    I'd look through mail archives for a topic "Shard in Production" I
    think
    it's called.  The shard portion may not be relevant but it does discuss
    certain settings that had to be applied with regards to avoiding
    corruption
    with VM's.  You may want to try and disable the
    performance.readdir-ahead
    also.

    On 03/12/2016 07:02 PM, David Gossage wrote:



    On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
    <[email protected]>
    <mailto:[email protected]>[email protected]
    <mailto:[email protected]>> wrote:

    Thanks David,

    My settings are all defaults, i have just created the pool and
    started
    it.
    I have set the settings as your recommendation and it seems to be the
    same issue;

    Type: Striped-Replicate
    Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
    Status: Started
    Number of Bricks: 1 x 2 x 2 = 4
    Transport-type: tcp
    Bricks:
    Brick1: gfs001:/bricks/t1/s
    Brick2: gfs002:/bricks/t1/s
    Brick3: gfs001:/bricks/t2/s
    Brick4: gfs002:/bricks/t2/s
    Options Reconfigured:
    performance.stat-prefetch: off
    network.remote-dio: on
    cluster.eager-lock: enable
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    performance.readdir-ahead: on

    Is their a raid controller perhaps doing any caching?

    In the gluster logs any errors being reported during migration
    process?
    Since they aren't in use yet have you tested making just mirrored
    bricks
    using different pairings of servers two at a time to see if problem
    follows
    certain machine or network ports?

    On 03/12/2016 03:25 PM, David Gossage wrote:



    On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
    <[email protected]>
    <mailto:[email protected]>[email protected]
    <mailto:[email protected]>> wrote:

    Dears,

    I have created a replicated striped volume with two bricks and two
    servers but I can't use it because when I mount it in ESXi and try
    to
    migrate a VM to it, the data get corrupted.
    Is any one have any idea why is this happening ?

    Dell 2950 x2
    Seagate 15k 600GB
    CentOS 7.2
    Gluster 3.7.8

    Appreciate your help.

    Most reports of this I have seen end up being settings related.  Post
    gluster volume info. Below is what I have seen as most common
    recommended
    settings.
    I'd hazard a guess you may have some the read ahead cache or prefetch
    on.

    quick-read=off
    read-ahead=off
    io-cache=off
    stat-prefetch=off
    eager-lock=enable
    remote-dio=on

    Mahdi Adnan
    System Admin


    _______________________________________________
    Gluster-users mailing list
    <[email protected]>
    <mailto:[email protected]>[email protected]
    <mailto:[email protected]>
    <http://www.gluster.org/mailman/listinfo/gluster-users> 
<http://www.gluster.org/mailman/listinfo/gluster-users>
    http://www.gluster.org/mailman/listinfo/gluster-users

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users




    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users




    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users



    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replicated striped data lose

Reply via email to