Re: [Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri Wed, 06 Aug 2014 05:11:28 -0700


On 08/05/2014 02:06 PM, Roman wrote:

Well, it seems like it doesn't see the changes were made to the volume? I created two files 200 and 100 MB (from /dev/zero) after Idisconnected the first brick. Then connected it back and got these logs:
[2014-08-05 08:30:37.830150] I[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change involfile, continuing[2014-08-05 08:30:37.830207] I [rpc-clnt.c:1676:rpc_clnt_reconfig]0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)[2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv]0-HA-fast-150G-PVE1-client-0: readv failed (No data available)[2014-08-05 08:30:37.831024] I[client-handshake.c:1659:select_server_supported_programs]0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num(1298437), Version (330)[2014-08-05 08:30:37.831375] I[client-handshake.c:1456:client_setvolume_cbk]0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153<http://10.250.0.1:49153>, attached to remote volume'/exports/fast-test/150G'.[2014-08-05 08:30:37.831394] I[client-handshake.c:1468:client_setvolume_cbk]0-HA-fast-150G-PVE1-client-0: Server and Client lk-version numbers arenot same, reopening the fds[2014-08-05 08:30:37.831566] I[client-handshake.c:450:client_set_lk_version_cbk]0-HA-fast-150G-PVE1-client-0: Server lk version = 1
[2014-08-05 08:30:37.830150] I[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change involfile, continuing
this line seems weird to me tbh.
I do not see any traffic on switch interfaces between gluster servers,which means, there is no syncing between them.I tried to ls -l the files on the client and servers to trigger thehealing, but seems like no success. Should I wait more?

Yes, it should take around 10-15 minutes. Could you provide 'getfattr -d-m. -e hex <file-on-brick>' on both the bricks.


Pranith

2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri<[email protected] <mailto:[email protected]>>:



    On 08/05/2014 01:10 PM, Roman wrote:

    Ahha! For some reason I was not able to start the VM anymore,
    Proxmox VE told me, that it is not able to read the qcow2 header
    due to permission is denied for some reason. So I just deleted
    that file and created a new VM. And the nex message I've got was
    this:

    Seems like these are the messages where you took down the bricks
    before self-heal. Could you restart the run waiting for self-heals
    to complete before taking down the next brick?

    Pranith



    [2014-08-05 07:31:25.663412] E
    [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
    0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of
    '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please
    delete the file from all but the preferred subvolume.- Pending
    matrix:  [ [ 0 60 ] [ 11 0 ] ]
    [2014-08-05 07:31:25.663955] E
    [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
    0-HA-fast-150G-PVE1-replicate-0: background  data self-heal
    failed on /images/124/vm-124-disk-1.qcow2



    2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri
    <[email protected] <mailto:[email protected]>>:

        I just responded to your earlier mail about how the log
        looks. The log comes on the mount's logfile

        Pranith

        On 08/05/2014 12:41 PM, Roman wrote:

        Ok, so I've waited enough, I think. Had no any traffic on
        switch ports between servers. Could not find any suitable
        log message about completed self-heal (waited about 30
        minutes). Plugged out the other server's UTP cable this time
        and got in the same situation:
        root@gluster-test1:~# cat /var/log/dmesg
        -bash: /bin/cat: Input/output error

        brick logs:
        [2014-08-05 07:09:03.005474] I
        [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server:
        disconnecting connectionfrom
        pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
        [2014-08-05 07:09:03.005530] I
        [server-helpers.c:729:server_connection_put]
        0-HA-fast-150G-PVE1-server: Shutting down connection
        pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
        [2014-08-05 07:09:03.005560] I
        [server-helpers.c:463:do_fd_cleanup]
        0-HA-fast-150G-PVE1-server: fd cleanup on
        /images/124/vm-124-disk-1.qcow2
        [2014-08-05 07:09:03.005797] I
        [server-helpers.c:617:server_connection_destroy]
        0-HA-fast-150G-PVE1-server: destroyed connection of
        pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0





        2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
        <[email protected] <mailto:[email protected]>>:

            Do you think it is possible for you to do these tests on
            the latest version 3.5.2? 'gluster volume heal <volname>
            info' would give you that information in versions > 3.5.1.
            Otherwise you will have to check it from either the
            logs, there will be self-heal completed message on the
            mount logs (or) by observing 'getfattr -d -m. -e hex
            <image-file-on-bricks>'

            Pranith


            On 08/05/2014 12:09 PM, Roman wrote:

            Ok, I understand. I will try this shortly.
            How can I be sure, that healing process is done, if I
            am not able to see its status?


            2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
            <[email protected] <mailto:[email protected]>>:

                Mounts will do the healing, not the
                self-heal-daemon. The problem I feel is that
                whichever process does the healing has the latest
                information about the good bricks in this usecase.
                Since for VM usecase, mounts should have the latest
                information, we should let the mounts do the
                healing. If the mount accesses the VM image either
                by someone doing operations inside the VM or
                explicit stat on the file it should do the healing.

                Pranith.


                On 08/05/2014 10:39 AM, Roman wrote:

                Hmmm, you told me to turn it off. Did I understood
                something wrong? After I issued the command you've
                sent me, I was not able to watch the healing
                process, it said, it won't be healed, becouse its
                turned off.


                2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
                <[email protected] <mailto:[email protected]>>:

                    You didn't mention anything about
                    self-healing. Did you wait until the self-heal
                    is complete?

                    Pranith

                    On 08/04/2014 05:49 PM, Roman wrote:

                    Hi!
                    Result is pretty same. I set the switch port
                    down for 1st server, it was ok. Then set it
                    up back and set other server's port off. and
                    it triggered IO error on two virtual
                    machines: one with local root FS but network
                    mounted storage. and other with network root
                    FS. 1st gave an error on copying to or from
                    the mounted network disk, other just gave me
                    an error for even reading log.files.

                    cat: /var/log/alternatives.log: Input/output
                    error
                    then I reset the kvm VM and it said me, there
                    is no boot device. Next I virtually powered
                    it off and then back on and it has booted.

                    By the way, did I have to start/stop volume?

                    >> Could you do the following and test it again?
                    >> gluster volume set <volname>
                    cluster.self-heal-daemon off

                    >>Pranith




                    2014-08-04 14:10 GMT+03:00 Pranith Kumar
                    Karampuri <[email protected]
                    <mailto:[email protected]>>:


                        On 08/04/2014 03:33 PM, Roman wrote:

                        Hello!

                        Facing the same problem as mentioned here:

                        
http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

                        my set up is up and running, so i'm
                        ready to help you back with feedback.

                        setup:
                        proxmox server as client
                        2 gluster physical  servers

                        server side and client side both running
                        atm 3.4.4 glusterfs from gluster repo.

                        the problem is:

                        1. craeted replica bricks.
                        2. mounted in proxmox (tried both promox
                        ways: via GUI and fstab (with backup
                        volume line), btw while mounting via
                        fstab I'm unable to launch a VM without
                        cache, meanwhile direct-io-mode is
                        enabled in fstab line)
                        3. installed VM
                        4. bring one volume down - ok
                        5. bringing up, waiting for sync is done.
                        6. bring other volume down - getting IO
                        errors on VM guest and not able to
                        restore the VM after I reset the VM via
                        host. It says (no bootable media). After
                        I shut it down (forced) and bring back
                        up, it boots.

                        Could you do the following and test it again?
                        gluster volume set <volname>
                        cluster.self-heal-daemon off

                        Pranith


                        Need help. Tried 3.4.3, 3.4.4.
                        Still missing pkg-s for 3.4.5 for debian
                        and 3.5.2 (3.5.1 always gives a healing
                        error for some reason)

--Best regards,

                        Roman.


                        _______________________________________________
                        Gluster-users mailing list
                        [email protected]  
<mailto:[email protected]>
                        
http://supercolony.gluster.org/mailman/listinfo/gluster-users

--Best regards,

                    Roman.

--Best regards,

                Roman.

--Best regards,

            Roman.

--Best regards,

        Roman.

--Best regards,

    Roman.





--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

Reply via email to