Re: [Gluster-users] Replicated striped data lose

Mahdi Adnan Tue, 15 Mar 2016 05:07:53 -0700

[2016-03-15 14:12:01.421615] I [MSGID: 109036][dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-v-dht: Settinglayout of /New Virtual Machine_2 with [Subvol_name: v-replicate-0, Err:-1 , Start: 0 , Stop: 1431655764 , Hash: 1 ], [Subvol_name:v-replicate-1, Err: -1 , Start: 1431655765 , Stop: 2863311529 , Hash: 1], [Subvol_name: v-replicate-2, Err: -1 , Start: 2863311530 , Stop:4294967295 , Hash: 1 ],[2016-03-15 14:12:02.001167] I [MSGID: 109066][dht-rename.c:1413:dht_rename] 0-v-dht: renaming /New VirtualMachine_2/New Virtual Machine.vmdk~(hash=v-replicate-2/cache=v-replicate-2) => /New Virtual Machine_2/NewVirtual Machine.vmdk (hash=v-replicate-2/cache=v-replicate-2)[2016-03-15 14:12:02.248164] W [MSGID: 112032][nfs3.c:3622:nfs3svc_rmdir_cbk] 0-nfs: 3fed7d9f: /New Virtual Machine_2=> -1 (Directory not empty) [Directory not empty][2016-03-15 14:12:02.259015] W [MSGID: 112032][nfs3.c:3622:nfs3svc_rmdir_cbk] 0-nfs: 3fed7da3: /New Virtual Machine_2=> -1 (Directory not empty) [Directory not empty]


Respectfully*
**Mahdi A. Mahdi*

On 03/15/2016 03:03 PM, Krutika Dhananjay wrote:

Hmm ok. Could you share the nfs.log content?

-Krutika

On Tue, Mar 15, 2016 at 1:45 PM, Mahdi Adnan<[email protected] <mailto:[email protected]>>wrote:


    Okay, here's what i did;

    Volume Name: v
    Type: Distributed-Replicate
    Volume ID: b348fd8e-b117-469d-bcc0-56a56bdfc930
    Status: Started
    Number of Bricks: 3 x 2 = 6
    Transport-type: tcp
    Bricks:
    Brick1: gfs001:/bricks/b001/v
    Brick2: gfs001:/bricks/b002/v
    Brick3: gfs001:/bricks/b003/v
    Brick4: gfs002:/bricks/b004/v
    Brick5: gfs002:/bricks/b005/v
    Brick6: gfs002:/bricks/b006/v
    Options Reconfigured:
    features.shard-block-size: 128MB
    features.shard: enable
    cluster.server-quorum-type: server
    cluster.quorum-type: auto
    network.remote-dio: enable
    cluster.eager-lock: enable
    performance.stat-prefetch: off
    performance.io-cache: off
    performance.read-ahead: off
    performance.quick-read: off
    performance.readdir-ahead: on


    same error.
    and still mounting using glusterfs will work just fine.

    Respectfully*
    **Mahdi A. Mahdi*


    On 03/15/2016 11:04 AM, Krutika Dhananjay wrote:

    OK but what if you use it with replication? Do you still see the
    error? I think not.
    Could you give it a try and tell me what you find?

    -Krutika

    On Tue, Mar 15, 2016 at 1:23 PM, Mahdi Adnan
    <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        I have created the following volume;

        Volume Name: v
        Type: Distribute
        Volume ID: 90de6430-7f83-4eda-a98f-ad1fabcf1043
        Status: Started
        Number of Bricks: 3
        Transport-type: tcp
        Bricks:
        Brick1: gfs001:/bricks/b001/v
        Brick2: gfs001:/bricks/b002/v
        Brick3: gfs001:/bricks/b003/v
        Options Reconfigured:
        features.shard-block-size: 128MB
        features.shard: enable
        cluster.server-quorum-type: server
        cluster.quorum-type: auto
        network.remote-dio: enable
        cluster.eager-lock: enable
        performance.stat-prefetch: off
        performance.io-cache: off
        performance.read-ahead: off
        performance.quick-read: off
        performance.readdir-ahead: on

        and after mounting it in ESXi and trying to clone a VM to it,
        i got the same error.


        Respectfully*
        **Mahdi A. Mahdi*


        On 03/15/2016 10:44 AM, Krutika Dhananjay wrote:

        Hi,

        Do not use sharding and stripe together in the same volume
        because
        a) It is not recommended and there is no point in using
        both. Using sharding alone on your volume should work fine.
        b) Nobody tested it.
        c) Like Niels said, stripe feature is virtually deprecated.

        I would suggest that you create an nx3 volume where n is the
        number of distribute subvols you prefer, enable group virt
        options on it, and enable sharding on it,
        set the shard-block-size that you feel appropriate and then
        just start off with VM image creation etc.
        If you run into any issues even after you do this, let us
        know and we'll help you out.

        -Krutika

        On Tue, Mar 15, 2016 at 1:07 PM, Mahdi Adnan
        <[email protected]
        <mailto:[email protected]>> wrote:

            Thanks Krutika,

            I have deleted the volume and created a new one.
            I found that it may be an issue with the NFS itself, i
            have created a new striped volume and enabled sharding
            and mounted it via glusterfs and it worked just fine, if
            i mount it with nfs it will fail and gives me the same
            errors.

            Respectfully*
            **Mahdi A. Mahdi*

            On 03/15/2016 06:24 AM, Krutika Dhananjay wrote:

            Hi,

            So could you share the xattrs associated with the file
            at
            <BRICK_PATH>/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c

            Here's what you need to execute:

            # getfattr -d -m . -e hex
            /mnt/b1/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
            on the first node and

            # getfattr -d -m . -e hex
            /mnt/b2/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
            on the second.


            Also, it is normally advised to use a replica 3 volume
            as opposed to replica 2 volume to guard against
            split-brains.

            -Krutika

            On Mon, Mar 14, 2016 at 3:17 PM, Mahdi Adnan
            <[email protected]
            <mailto:[email protected]>> wrote:

                sorry for serial posting but, i got new logs it
                might help..

                the message appear during the migration;

                /var/log/glusterfs/nfs.log


                [2016-03-14 09:45:04.573765] I [MSGID: 109036]
                [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal]
                0-testv-dht: Setting layout of /New Virtual
                Machine_1 with [Subvol_name: testv-stripe-0, Err:
                -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
                [2016-03-14 09:45:04.957499] E
                [shard.c:369:shard_modify_size_and_block_count]
                
(-->/usr/lib64/glusterfs/3.7.8/xlator/cluster/distribute.so(dht_file_setattr_cbk+0x14f)
                [0x7f27a13c067f]
                
-->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_common_setattr_cbk+0xcc)
                [0x7f27a116681c]
                
-->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_modify_size_and_block_count+0xdd)
                [0x7f27a116584d] ) 0-testv-shard: Failed to get
                trusted.glusterfs.shard.file-size for
                c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
                [2016-03-14 09:45:04.957577] W [MSGID: 112199]
                [nfs3-helpers.c:3418:nfs3_log_common_res]
                0-nfs-nfsv3: /New Virtual Machine_1/New Virtual
                Machine-flat.vmdk => (XID: 3fec5a26, SETATTR: NFS:
                22(Invalid argument for operation), POSIX:
                22(Invalid argument)) [Invalid argument]
                [2016-03-14 09:45:05.079657] E [MSGID: 112069]
                [nfs3.c:3649:nfs3_rmdir_resume] 0-nfs-nfsv3: No
                such file or directory: (192.168.221.52:826
                <http://192.168.221.52:826>) testv :
                00000000-0000-0000-0000-000000000001



                Respectfully*
                **Mahdi A. Mahd

                *
                On 03/14/2016 11:14 AM, Mahdi Adnan wrote:

                So i have deployed a new server "Cisco UCS C220M4"
                and created a new volume;

                Volume Name: testv
                Type: Stripe
                Volume ID: 55cdac79-fe87-4f1f-90c0-15c9100fe00b
                Status: Started
                Number of Bricks: 1 x 2 = 2
                Transport-type: tcp
                Bricks:
                Brick1: 10.70.0.250:/mnt/b1/v
                Brick2: 10.70.0.250:/mnt/b2/v
                Options Reconfigured:
                nfs.disable: off
                features.shard-block-size: 64MB
                features.shard: enable
                cluster.server-quorum-type: server
                cluster.quorum-type: auto
                network.remote-dio: enable
                cluster.eager-lock: enable
                performance.stat-prefetch: off
                performance.io-cache: off
                performance.read-ahead: off
                performance.quick-read: off
                performance.readdir-ahead: off

                same error ..

                can anyone share with me the info of a working
                striped volume ?

                On 03/14/2016 09:02 AM, Mahdi Adnan wrote:

                I have a pool of two bricks in the same server;

                Volume Name: k
                Type: Stripe
                Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
                Status: Started
                Number of Bricks: 1 x 2 = 2
                Transport-type: tcp
                Bricks:
                Brick1: gfs001:/bricks/t1/k
                Brick2: gfs001:/bricks/t2/k
                Options Reconfigured:
                features.shard-block-size: 64MB
                features.shard: on
                cluster.server-quorum-type: server
                cluster.quorum-type: auto
                network.remote-dio: enable
                cluster.eager-lock: enable
                performance.stat-prefetch: off
                performance.io-cache: off
                performance.read-ahead: off
                performance.quick-read: off
                performance.readdir-ahead: off

                same issue ...
                glusterfs 3.7.8 built on Mar 10 2016 20:20:45.


                Respectfully*
                **Mahdi A. Mahdi*

                Systems Administrator
                IT. Department
                Earthlink Telecommunications
                <https://www.facebook.com/earthlinktele>

                Cell: 07903316180
                Work: 3352
                Skype: [email protected]
                <mailto:[email protected]>
                On 03/14/2016 08:11 AM, Niels de Vos wrote:

                On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay 
wrote:

                It would be better to use sharding over stripe for your vm use 
case. It
                offers better distribution and utilisation of bricks and better 
heal
                performance.
                And it is well tested.

                Basically the "striping" feature is deprecated, "sharding" is 
its
                improved replacement. I expect to see "striping" completely 
dropped in
                the next major release.

                Niels

                Couple of things to note before you do that:
                1. Most of the bug fixes in sharding have gone into 3.7.8. So 
it is advised
                that you use 3.7.8 or above.
                2. When you enable sharding on a volume, already existing files 
in the
                volume do not get sharded. Only the files that are newly 
created from the
                time sharding is enabled will.
                     If you do want to shard the existing files, then you would 
need to cp
                them to a temp name within the volume, and then rename them 
back to the
                original file name.

                HTH,
                Krutika

                On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan 
<[email protected]
                <mailto:[email protected]>

                wrote:
                I couldn't find anything related to cache in the HBAs.
                what logs are useful in my case ? i see only bricks logs which 
contains
                nothing during the failure.

                ###
                [2016-03-13 18:05:19.728614] E [MSGID: 113022] 
[posix.c:1232:posix_mknod]
                0-vmware-posix: mknod on
                
/bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed
                [File exists]
                [2016-03-13 18:07:23.337086] E [MSGID: 113022] 
[posix.c:1232:posix_mknod]
                0-vmware-posix: mknod on
                
/bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed
                [File exists]
                [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 
0-vmware-trash:
                rmdir issued on /.trashcan/, which is not permitted
                [2016-03-13 18:07:55.027635] I [MSGID: 115056]
                [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 
41987: RMDIR
                /.trashcan/internal_op 
(00000000-0000-0000-0000-000000000005/internal_op)
                ==> (Operation not permitted) [Operation not permitted]
                [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 
0-auth/login: allowed
                user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
                [2016-03-13 18:11:34.353463] I [MSGID: 115029]
                [server-handshake.c:612:server_setvolume] 0-vmware-server: 
accepted client
                from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 
(version:
                3.7.8)
                [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 
0-auth/login: allowed
                user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
                [2016-03-13 18:11:34.591173] I [MSGID: 115029]
                [server-handshake.c:612:server_setvolume] 0-vmware-server: 
accepted client
                from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 
(version:
                3.7.8)
                ###

                ESXi just keeps telling me "Cannot clone T: The virtual disk is 
either
                corrupted or not a supported format.
                error
                3/13/2016 9:06:20 PM
                Clone virtual machine
                T
                VCENTER.LOCAL\Administrator
                "

                My setup is 2 servers with a floating ip controlled by CTDB and 
my ESXi
                server mount the NFS via the floating ip.





                On 03/13/2016 08:40 PM, pkoelle wrote:

                Am 13.03.2016 um 18:22 schrieb David Gossage:

                On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <
                [email protected]
                <mailto:[email protected]>

                wrote:

                My HBAs are LSISAS1068E, and the filesystem is XFS.

                I tried EXT4 and it did not help.
                I have created a stripted volume in one server with two bricks, 
same
                issue.
                and i tried a replicated volume with just "sharding enabled" 
same issue,
                as soon as i disable the sharding it works just fine, niether 
sharding
                nor
                striping works for me.
                i did follow up with some of threads in the mailing list and 
tried some
                of
                the fixes that worked with the others, none worked for me. :(

                Is it possible the LSI has write-cache enabled?

                Why is that relevant? Even the backing filesystem has no idea 
if there is
                a RAID or write cache or whatever. There are blocks and sync(), 
end of
                story.
                If you lose power and screw up your recovery OR do funky stuff 
with SAS
                multipathing that might be an issue with a controller cache. 
AFAIK thats
                not what we are talking about.

                I'm afraid but unless the OP has some logs from the server, a
                reproducible testcase or a backtrace from client or server this 
isn't
                getting us anywhere.

                cheers
                Paul

                On 03/13/2016 06:54 PM, David Gossage wrote:

                On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
                [email protected]
                <mailto:[email protected]>> wrote:

                Okay so i have enabled shard in my test volume and it did not 
help,

                stupidly enough, i have enabled it in a production volume
                "Distributed-Replicate" and it currpted  half of my VMs.
                I have updated Gluster to the latest and nothing seems to be 
changed in
                my situation.
                below the info of my volume;

                I was pointing at the settings in that email as an example for
                corruption
                fixing. I wouldn't recommend enabling sharding if you haven't 
gotten the
                base working yet on that cluster. What HBA's are you using and 
what is
                layout of filesystem for bricks?


                Number of Bricks: 3 x 2 = 6

                Transport-type: tcp
                Bricks:
                Brick1: gfs001:/bricks/b001/vmware
                Brick2: gfs002:/bricks/b004/vmware
                Brick3: gfs001:/bricks/b002/vmware
                Brick4: gfs002:/bricks/b005/vmware
                Brick5: gfs001:/bricks/b003/vmware
                Brick6: gfs002:/bricks/b006/vmware
                Options Reconfigured:
                performance.strict-write-ordering: on
                cluster.server-quorum-type: server
                cluster.quorum-type: auto
                network.remote-dio: enable
                performance.stat-prefetch: disable
                performance.io-cache: off
                performance.read-ahead: off
                performance.quick-read: off
                cluster.eager-lock: enable
                features.shard-block-size: 16MB
                features.shard: on
                performance.readdir-ahead: off


                On 03/12/2016 08:11 PM, David Gossage wrote:


                On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
                <[email protected]>
                
<mailto:[email protected]>[email protected]
                <mailto:[email protected]>> wrote:

                Both servers have HBA no RAIDs and i can setup a replicated or

                dispensers without any issues.
                Logs are clean and when i tried to migrate a vm and got the 
error,
                nothing showed up in the logs.
                i tried mounting the volume into my laptop and it mounted fine 
but,
                if i
                use dd to create a data file it just hang and i cant cancel it, 
and i
                cant
                unmount it or anything, i just have to reboot.
                The same servers have another volume on other bricks in a 
distributed
                replicas, works fine.
                I have even tried the same setup in a virtual environment 
(created two
                vms and install gluster and created a replicated striped) and 
again
                same
                thing, data corruption.

                I'd look through mail archives for a topic "Shard in 
Production" I
                think
                it's called.  The shard portion may not be relevant but it does 
discuss
                certain settings that had to be applied with regards to avoiding
                corruption
                with VM's.  You may want to try and disable the
                performance.readdir-ahead
                also.

                On 03/12/2016 07:02 PM, David Gossage wrote:



                On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
                <[email protected]>
                
<mailto:[email protected]>[email protected]
                <mailto:[email protected]>> wrote:

                Thanks David,

                My settings are all defaults, i have just created the pool and
                started
                it.
                I have set the settings as your recommendation and it seems to 
be the
                same issue;

                Type: Striped-Replicate
                Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
                Status: Started
                Number of Bricks: 1 x 2 x 2 = 4
                Transport-type: tcp
                Bricks:
                Brick1: gfs001:/bricks/t1/s
                Brick2: gfs002:/bricks/t1/s
                Brick3: gfs001:/bricks/t2/s
                Brick4: gfs002:/bricks/t2/s
                Options Reconfigured:
                performance.stat-prefetch: off
                network.remote-dio: on
                cluster.eager-lock: enable
                performance.io-cache: off
                performance.read-ahead: off
                performance.quick-read: off
                performance.readdir-ahead: on

                Is their a raid controller perhaps doing any caching?

                In the gluster logs any errors being reported during migration
                process?
                Since they aren't in use yet have you tested making just 
mirrored
                bricks
                using different pairings of servers two at a time to see if 
problem
                follows
                certain machine or network ports?

                On 03/12/2016 03:25 PM, David Gossage wrote:



                On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
                <[email protected]>
                
<mailto:[email protected]>[email protected]
                <mailto:[email protected]>> wrote:

                Dears,

                I have created a replicated striped volume with two bricks and 
two
                servers but I can't use it because when I mount it in ESXi and 
try
                to
                migrate a VM to it, the data get corrupted.
                Is any one have any idea why is this happening ?

                Dell 2950 x2
                Seagate 15k 600GB
                CentOS 7.2
                Gluster 3.7.8

                Appreciate your help.

                Most reports of this I have seen end up being settings related. 
 Post
                gluster volume info. Below is what I have seen as most common
                recommended
                settings.
                I'd hazard a guess you may have some the read ahead cache or 
prefetch
                on.

                quick-read=off
                read-ahead=off
                io-cache=off
                stat-prefetch=off
                eager-lock=enable
                remote-dio=on

                Mahdi Adnan
                System Admin


                _______________________________________________
                Gluster-users mailing list
                <[email protected]>
                <mailto:[email protected]>[email protected]
                <mailto:[email protected]>
                <http://www.gluster.org/mailman/listinfo/gluster-users>
                <http://www.gluster.org/mailman/listinfo/gluster-users>
                http://www.gluster.org/mailman/listinfo/gluster-users

                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users

                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users

                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users

                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users




                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users




                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users



                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replicated striped data lose

Reply via email to