Re: [Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri Mon, 01 Sep 2014 01:28:33 -0700


On 09/01/2014 12:56 PM, Roman wrote:

Hmm, I don't know how, but both VM-s survived the second server outage:) Still had no any message about healing completion anywhere :)

Healing can be performed by:
1) Mount process (/path/to/mount/log/)

2) Self-heal daemons on either of the bricks(/var/log/glusterfs/glustershd.log)


Check if there are any messages on either of these logs.

Pranith

2014-09-01 10:13 GMT+03:00 Roman <[email protected]<mailto:[email protected]>>:


    The mount is on the proxmox machine.

    here are the logs from disconnection till connection:


    [2014-09-01 06:19:38.059383] W [socket.c:522:__socket_rwv]
    0-glusterfs: readv on 10.250.0.1:24007 <http://10.250.0.1:24007>
    failed (Connection timed out)
    [2014-09-01 06:19:40.338393] W [socket.c:522:__socket_rwv]
    0-HA-2TB-TT-Proxmox-cluster-client-0: readv on 10.250.0.1:49159
    <http://10.250.0.1:49159> failed (Connection timed out)
    [2014-09-01 06:19:40.338447] I [client.c:2229:client_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-client-0: disconnected from
    10.250.0.1:49159 <http://10.250.0.1:49159>. Client process will
    keep trying to connect to glusterd until brick's port is available
    [2014-09-01 06:19:49.196768] E
    [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to
    10.250.0.1:24007 <http://10.250.0.1:24007> failed (No route to host)
    [2014-09-01 06:20:05.565444] E
    [socket.c:2161:socket_connect_finish]
    0-HA-2TB-TT-Proxmox-cluster-client-0: connection to
    10.250.0.1:24007 <http://10.250.0.1:24007> failed (No route to host)
    [2014-09-01 06:23:26.607180] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
    0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)
    [2014-09-01 06:23:26.608032] I
    [client-handshake.c:1677:select_server_supported_programs]
    0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3,
    Num (1298437), Version (330)
    [2014-09-01 06:23:26.608395] I
    [client-handshake.c:1462:client_setvolume_cbk]
    0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to
    10.250.0.1:49159 <http://10.250.0.1:49159>, attached to remote
    volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
    [2014-09-01 06:23:26.608420] I
    [client-handshake.c:1474:client_setvolume_cbk]
    0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version
    numbers are not same, reopening the fds
    [2014-09-01 06:23:26.608606] I
    [client-handshake.c:450:client_set_lk_version_cbk]
    0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1
    [2014-09-01 06:23:40.604979] I
    [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change
    in volfile, continuing

    Now there is no healing traffic also. I could try to disconnect
    now second server to see if it is going to failover. I don't
    really believe it will :(

    here are some logs for stor1 server (the one I've disconnected):
    root@stor1:~# cat
    /var/log/glusterfs/bricks/exports-HA-2TB-TT-Proxmox-cluster-2TB.log
    [2014-09-01 06:19:26.403323] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:26.403399] I
    [server-helpers.c:289:do_fd_cleanup]
    0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on
    /images/112/vm-112-disk-1.raw
    [2014-09-01 06:19:26.403486] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:29.475318] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    
stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:29.475373] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    
stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:36.963318] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    
stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:36.963373] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    
stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:40.419298] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    
pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:40.419355] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    
pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:42.531310] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    
sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:19:42.531368] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    
sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:23:25.088518] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    
sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:25.532734] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    
stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:26.608074] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    
pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:27.187556] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:27.213890] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    
stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:31.222654] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    
pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-1
    (version: 3.5.2)
    [2014-09-01 06:23:52.591365] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    
pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:23:52.591447] W
    [inodelk.c:392:pl_inodelk_log_cleanup]
    0-HA-2TB-TT-Proxmox-cluster-server: releasing lock on
    14f70955-5e1e-4499-b66b-52cd50892315 held by
    {client=0x7f2494001ed0, pid=0 lk-owner=bc3ddbdbae7f0000}
    [2014-09-01 06:23:52.591568] I
    [server-helpers.c:289:do_fd_cleanup]
    0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on
    /images/124/vm-124-disk-1.qcow2
    [2014-09-01 06:23:52.591679] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    
pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:23:58.709444] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    (version: 3.5.2)
    [2014-09-01 06:24:00.741542] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:24:00.741598] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:30:06.010819] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    (version: 3.5.2)
    [2014-09-01 06:30:08.056059] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:30:08.056127] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:36:54.307743] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    (version: 3.5.2)
    [2014-09-01 06:36:56.340078] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:36:56.340122] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:46:53.601517] I
    [server-handshake.c:575:server_setvolume]
    0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
    stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    (version: 3.5.2)
    [2014-09-01 06:46:55.624705] I [server.c:520:server_rpc_notify]
    0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
    stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0
    [2014-09-01 06:46:55.624793] I [client_t.c:417:gf_client_unref]
    0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
    stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0

    last 2 lines are pretty unclear. Why it has disconnected?




    2014-09-01 9:41 GMT+03:00 Pranith Kumar Karampuri
    <[email protected] <mailto:[email protected]>>:


        On 09/01/2014 12:08 PM, Roman wrote:

        Well, as for me, VM-s are not very impacted by healing
        process. At least the munin server running with pretty high
        load (average rarely goes below 0,9 :) )had no problems. To
        create some more load I've made a copy of 590 MB file on the
        VM-s disk, It took 22 seconds. Which is ca 27 MB /sec or 214
        Mbps/sec

        Servers are connected via 10 gbit network. Proxmox client is
        connected to the server with separate 1 gbps interface. We
        are thinking of moving it to 10gbps also.

        Here are some heal info which is pretty confusing.

        right after 1st server restored it connection, it was pretty
        clear:

        root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
        Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
        /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
        Number of entries: 1

        Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
        /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
        /images/112/vm-112-disk-1.raw - Possibly undergoing heal
        Number of entries: 2


        some time later is says
        root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
        Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
        Number of entries: 0

        Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
        Number of entries: 0

        while I can still see traffic between servers and still there
        was no messages about healing process completion.

        On which machine do we have the mount?

        Pranith




        2014-08-29 10:02 GMT+03:00 Pranith Kumar Karampuri
        <[email protected] <mailto:[email protected]>>:

            Wow, this is great news! Thanks a lot for sharing the
            results :-). Did you get a chance to test the performance
            of the applications in the vm during self-heal?
            May I know more about your use case? i.e. How many vms
            and what is the avg size of each vm etc?

            Pranith


            On 08/28/2014 11:27 PM, Roman wrote:

            Here are the results.
            1. still have problem with logs rotation. logs are being
            written to .log.1 file, not .log file. any hints, how to
            fix?
            2. healing logs are now much more better, I can see the
            successful message.
            3. both volumes with HD off and on successfully synced.
            the volume with HD on synced much more faster.
            4. both VMs on volumes survived the outage, when new
            files were added  and deleted during outage.

            So replication works well with both HD on and off for
            volumes for VM-s. With HD even faster. Need to solve the
            logging issue.

            Seems we could start production storage from this moment
            :) The whole company will use it. Some distributed and
            some replicated. Thanks for great product.


            2014-08-27 16:03 GMT+03:00 Roman <[email protected]
            <mailto:[email protected]>>:

                Installed new packages. Will make some tests
                tomorrow. thanx.


                2014-08-27 14:10 GMT+03:00 Pranith Kumar Karampuri
                <[email protected] <mailto:[email protected]>>:


                    On 08/27/2014 04:38 PM, Kaleb KEITHLEY wrote:

                        On 08/27/2014 03:09 AM, Humble Chirammal wrote:



                            ----- Original Message -----
                            | From: "Pranith Kumar Karampuri"
                            <[email protected]
                            <mailto:[email protected]>>
                            | To: "Humble Chirammal"
                            <[email protected]
                            <mailto:[email protected]>>
                            | Cc: "Roman" <[email protected]
                            <mailto:[email protected]>>,
                            [email protected]
                            <mailto:[email protected]>,
                            "Niels de Vos" <[email protected]
                            <mailto:[email protected]>>
                            | Sent: Wednesday, August 27, 2014
                            12:34:22 PM
                            | Subject: Re: [Gluster-users] libgfapi
                            failover problem on replica bricks
                            |
                            |
                            | On 08/27/2014 12:24 PM, Roman wrote:
                            | > root@stor1:~# ls -l /usr/sbin/glfsheal
                            | > ls: cannot access
                            /usr/sbin/glfsheal: No such file or
                            directory
                            | > Seems like not.
                            | Humble,
                            |       Seems like the binary is still
                            not packaged?

                            Checking with Kaleb on this.

                        ...

                            | >>>            |
                            | >>>            | Humble/Niels,
                            | >>>            |     Do we have debs
                            available for 3.5.2? In 3.5.1
                            | >>>  there was packaging
                            | >>>            | issue where
                            /usr/bin/glfsheal is not packaged along
                            | >>>  with the deb. I
                            | >>>            | think that should be
                            fixed now as well?
                            | >>>            |
                            | >>>  Pranith,
                            | >>>
                            | >>>            The 3.5.2 packages for
                            debian is not available yet. We
                            | >>>            are co-ordinating
                            internally to get it processed.
                            | >>>            I will update the list
                            once its available.
                            | >>>
                            | >>>  --Humble


                        glfsheal isn't in our 3.5.2-1 DPKGs either.
                        We (meaning I) started with the 3.5.1
                        packaging bits from Semiosis. Perhaps he
                        fixed 3.5.1 after giving me his bits.

                        I'll fix it and spin 3.5.2-2 DPKGs.

                    That is great Kaleb. Please notify semiosis as
                    well in case he is yet to fix it.

                    Pranith

--

                        Kaleb

--Best regards,

                Roman.

--Best regards,

            Roman.

--Best regards,

        Roman.

--Best regards,

    Roman.




--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

Reply via email to