Re: [Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri Wed, 06 Aug 2014 05:10:30 -0700


On 08/05/2014 01:10 PM, Roman wrote:

Ahha! For some reason I was not able to start the VM anymore, ProxmoxVE told me, that it is not able to read the qcow2 header due topermission is denied for some reason. So I just deleted that file andcreated a new VM. And the nex message I've got was this:

Seems like these are the messages where you took down the bricks beforeself-heal. Could you restart the run waiting for self-heals to completebefore taking down the next brick?


Pranith

[2014-08-05 07:31:25.663412] E[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of'/images/124/vm-124-disk-1.qcow2' (possible split-brain). Pleasedelete the file from all but the preferred subvolume.- Pending matrix:[ [ 0 60 ] [ 11 0 ] ][2014-08-05 07:31:25.663955] E[afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]0-HA-fast-150G-PVE1-replicate-0: background data self-heal failed on/images/124/vm-124-disk-1.qcow2

2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri<[email protected] <mailto:[email protected]>>:


    I just responded to your earlier mail about how the log looks. The
    log comes on the mount's logfile

    Pranith

    On 08/05/2014 12:41 PM, Roman wrote:

    Ok, so I've waited enough, I think. Had no any traffic on switch
    ports between servers. Could not find any suitable log message
    about completed self-heal (waited about 30 minutes). Plugged out
    the other server's UTP cable this time and got in the same
    situation:
    root@gluster-test1:~# cat /var/log/dmesg
    -bash: /bin/cat: Input/output error

    brick logs:
    [2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify]
    0-HA-fast-150G-PVE1-server: disconnecting connectionfrom
    pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
    [2014-08-05 07:09:03.005530] I
    [server-helpers.c:729:server_connection_put]
    0-HA-fast-150G-PVE1-server: Shutting down connection
    pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
    [2014-08-05 07:09:03.005560] I
    [server-helpers.c:463:do_fd_cleanup] 0-HA-fast-150G-PVE1-server:
    fd cleanup on /images/124/vm-124-disk-1.qcow2
    [2014-08-05 07:09:03.005797] I
    [server-helpers.c:617:server_connection_destroy]
    0-HA-fast-150G-PVE1-server: destroyed connection of
    pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0





    2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
    <[email protected] <mailto:[email protected]>>:

        Do you think it is possible for you to do these tests on the
        latest version 3.5.2? 'gluster volume heal <volname> info'
        would give you that information in versions > 3.5.1.
        Otherwise you will have to check it from either the logs,
        there will be self-heal completed message on the mount logs
        (or) by observing 'getfattr -d -m. -e hex <image-file-on-bricks>'

        Pranith


        On 08/05/2014 12:09 PM, Roman wrote:

        Ok, I understand. I will try this shortly.
        How can I be sure, that healing process is done, if I am not
        able to see its status?


        2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
        <[email protected] <mailto:[email protected]>>:

            Mounts will do the healing, not the self-heal-daemon.
            The problem I feel is that whichever process does the
            healing has the latest information about the good bricks
            in this usecase. Since for VM usecase, mounts should
            have the latest information, we should let the mounts do
            the healing. If the mount accesses the VM image either
            by someone doing operations inside the VM or explicit
            stat on the file it should do the healing.

            Pranith.


            On 08/05/2014 10:39 AM, Roman wrote:

            Hmmm, you told me to turn it off. Did I understood
            something wrong? After I issued the command you've sent
            me, I was not able to watch the healing process, it
            said, it won't be healed, becouse its turned off.


            2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
            <[email protected] <mailto:[email protected]>>:

                You didn't mention anything about self-healing. Did
                you wait until the self-heal is complete?

                Pranith

                On 08/04/2014 05:49 PM, Roman wrote:

                Hi!
                Result is pretty same. I set the switch port down
                for 1st server, it was ok. Then set it up back and
                set other server's port off. and it triggered IO
                error on two virtual machines: one with local root
                FS but network mounted storage. and other with
                network root FS. 1st gave an error on copying to
                or from the mounted network disk, other just gave
                me an error for even reading log.files.

                cat: /var/log/alternatives.log: Input/output error
                then I reset the kvm VM and it said me, there is
                no boot device. Next I virtually powered it off
                and then back on and it has booted.

                By the way, did I have to start/stop volume?

                >> Could you do the following and test it again?
                >> gluster volume set <volname>
                cluster.self-heal-daemon off

                >>Pranith




                2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri
                <[email protected] <mailto:[email protected]>>:


                    On 08/04/2014 03:33 PM, Roman wrote:

                    Hello!

                    Facing the same problem as mentioned here:

                    
http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

                    my set up is up and running, so i'm ready to
                    help you back with feedback.

                    setup:
                    proxmox server as client
                    2 gluster physical  servers

                    server side and client side both running atm
                    3.4.4 glusterfs from gluster repo.

                    the problem is:

                    1. craeted replica bricks.
                    2. mounted in proxmox (tried both promox
                    ways: via GUI and fstab (with backup volume
                    line), btw while mounting via fstab I'm
                    unable to launch a VM without cache,
                    meanwhile direct-io-mode is enabled in fstab
                    line)
                    3. installed VM
                    4. bring one volume down - ok
                    5. bringing up, waiting for sync is done.
                    6. bring other volume down - getting IO
                    errors on VM guest and not able to restore
                    the VM after I reset the VM via host. It says
                    (no bootable media). After I shut it down
                    (forced) and bring back up, it boots.

                    Could you do the following and test it again?
                    gluster volume set <volname>
                    cluster.self-heal-daemon off

                    Pranith


                    Need help. Tried 3.4.3, 3.4.4.
                    Still missing pkg-s for 3.4.5 for debian and
                    3.5.2 (3.5.1 always gives a healing error for
                    some reason)

--Best regards,

                    Roman.


                    _______________________________________________
                    Gluster-users mailing list
                    [email protected]  
<mailto:[email protected]>
                    
http://supercolony.gluster.org/mailman/listinfo/gluster-users

--Best regards,

                Roman.

--Best regards,

            Roman.

--Best regards,

        Roman.

--Best regards,

    Roman.





--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

Reply via email to