Re: [Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri Mon, 01 Sep 2014 01:26:31 -0700


On 09/01/2014 12:43 PM, Roman wrote:

The mount is on the proxmox machine.
here are the logs from disconnection till connection:
[2014-09-01 06:19:38.059383] W [socket.c:522:__socket_rwv]0-glusterfs: readv on 10.250.0.1:24007 <http://10.250.0.1:24007>failed (Connection timed out)[2014-09-01 06:19:40.338393] W [socket.c:522:__socket_rwv]0-HA-2TB-TT-Proxmox-cluster-client-0: readv on 10.250.0.1:49159<http://10.250.0.1:49159> failed (Connection timed out)[2014-09-01 06:19:40.338447] I [client.c:2229:client_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-client-0: disconnected from10.250.0.1:49159 <http://10.250.0.1:49159>. Client process will keeptrying to connect to glusterd until brick's port is available[2014-09-01 06:19:49.196768] E [socket.c:2161:socket_connect_finish]0-glusterfs: connection to 10.250.0.1:24007 <http://10.250.0.1:24007>failed (No route to host)[2014-09-01 06:20:05.565444] E [socket.c:2161:socket_connect_finish]0-HA-2TB-TT-Proxmox-cluster-client-0: connection to 10.250.0.1:24007<http://10.250.0.1:24007> failed (No route to host)[2014-09-01 06:23:26.607180] I [rpc-clnt.c:1729:rpc_clnt_reconfig]0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)[2014-09-01 06:23:26.608032] I[client-handshake.c:1677:select_server_supported_programs]0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num(1298437), Version (330)[2014-09-01 06:23:26.608395] I[client-handshake.c:1462:client_setvolume_cbk]0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159<http://10.250.0.1:49159>, attached to remote volume'/exports/HA-2TB-TT-Proxmox-cluster/2TB'.[2014-09-01 06:23:26.608420] I[client-handshake.c:1474:client_setvolume_cbk]0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-versionnumbers are not same, reopening the fds[2014-09-01 06:23:26.608606] I[client-handshake.c:450:client_set_lk_version_cbk]0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1[2014-09-01 06:23:40.604979] I[glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change involfile, continuing
Now there is no healing traffic also. I could try to disconnect nowsecond server to see if it is going to failover. I don't reallybelieve it will :(
here are some logs for stor1 server (the one I've disconnected):
root@stor1:~# cat/var/log/glusterfs/bricks/exports-HA-2TB-TT-Proxmox-cluster-2TB.log[2014-09-01 06:19:26.403323] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrompve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:26.403399] I [server-helpers.c:289:do_fd_cleanup]0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on/images/112/vm-112-disk-1.raw[2014-09-01 06:19:26.403486] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionpve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:29.475318] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:29.475373] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:36.963318] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:36.963373] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:40.419298] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrompve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:40.419355] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionpve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:42.531310] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromsisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:19:42.531368] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionsisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:23:25.088518] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromsisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:25.532734] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:26.608074] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client frompve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:27.187556] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client frompve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:27.213890] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:31.222654] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client frompve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-1(version: 3.5.2)[2014-09-01 06:23:52.591365] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrompve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:23:52.591447] W [inodelk.c:392:pl_inodelk_log_cleanup]0-HA-2TB-TT-Proxmox-cluster-server: releasing lock on14f70955-5e1e-4499-b66b-52cd50892315 held by {client=0x7f2494001ed0,pid=0 lk-owner=bc3ddbdbae7f0000}[2014-09-01 06:23:52.591568] I [server-helpers.c:289:do_fd_cleanup]0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on/images/124/vm-124-disk-1.qcow2[2014-09-01 06:23:52.591679] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionpve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:23:58.709444] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0(version: 3.5.2)[2014-09-01 06:24:00.741542] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:24:00.741598] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:30:06.010819] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0(version: 3.5.2)[2014-09-01 06:30:08.056059] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:30:08.056127] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:36:54.307743] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0(version: 3.5.2)[2014-09-01 06:36:56.340078] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:36:56.340122] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:46:53.601517] I[server-handshake.c:575:server_setvolume]0-HA-2TB-TT-Proxmox-cluster-server: accepted client fromstor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0(version: 3.5.2)[2014-09-01 06:46:55.624705] I [server.c:520:server_rpc_notify]0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfromstor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0[2014-09-01 06:46:55.624793] I [client_t.c:417:gf_client_unref]0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connectionstor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0

Are you running any commands like 'gluster volume heal <volname> info'continuously?


Pranith


last 2 lines are pretty unclear. Why it has disconnected?

2014-09-01 9:41 GMT+03:00 Pranith Kumar Karampuri <[email protected]<mailto:[email protected]>>:



    On 09/01/2014 12:08 PM, Roman wrote:

    Well, as for me, VM-s are not very impacted by healing process.
    At least the munin server running with pretty high load (average
    rarely goes below 0,9 :) )had no problems. To create some more
    load I've made a copy of 590 MB file on the VM-s disk, It took 22
    seconds. Which is ca 27 MB /sec or 214 Mbps/sec

    Servers are connected via 10 gbit network. Proxmox client is
    connected to the server with separate 1 gbps interface. We are
    thinking of moving it to 10gbps also.

    Here are some heal info which is pretty confusing.

    right after 1st server restored it connection, it was pretty clear:

    root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
    Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
    /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
    Number of entries: 1

    Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
    /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
    /images/112/vm-112-disk-1.raw - Possibly undergoing heal
    Number of entries: 2


    some time later is says
    root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
    Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
    Number of entries: 0

    Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
    Number of entries: 0

    while I can still see traffic between servers and still there was
    no messages about healing process completion.

    On which machine do we have the mount?

    Pranith




    2014-08-29 10:02 GMT+03:00 Pranith Kumar Karampuri
    <[email protected] <mailto:[email protected]>>:

        Wow, this is great news! Thanks a lot for sharing the results
        :-). Did you get a chance to test the performance of the
        applications in the vm during self-heal?
        May I know more about your use case? i.e. How many vms and
        what is the avg size of each vm etc?

        Pranith


        On 08/28/2014 11:27 PM, Roman wrote:

        Here are the results.
        1. still have problem with logs rotation. logs are being
        written to .log.1 file, not .log file. any hints, how to fix?
        2. healing logs are now much more better, I can see the
        successful message.
        3. both volumes with HD off and on successfully synced. the
        volume with HD on synced much more faster.
        4. both VMs on volumes survived the outage, when new files
        were added  and deleted during outage.

        So replication works well with both HD on and off for
        volumes for VM-s. With HD even faster. Need to solve the
        logging issue.

        Seems we could start production storage from this moment :)
        The whole company will use it. Some distributed and some
        replicated. Thanks for great product.


        2014-08-27 16:03 GMT+03:00 Roman <[email protected]
        <mailto:[email protected]>>:

            Installed new packages. Will make some tests tomorrow.
            thanx.


            2014-08-27 14:10 GMT+03:00 Pranith Kumar Karampuri
            <[email protected] <mailto:[email protected]>>:


                On 08/27/2014 04:38 PM, Kaleb KEITHLEY wrote:

                    On 08/27/2014 03:09 AM, Humble Chirammal wrote:



                        ----- Original Message -----
                        | From: "Pranith Kumar Karampuri"
                        <[email protected]
                        <mailto:[email protected]>>
                        | To: "Humble Chirammal"
                        <[email protected]
                        <mailto:[email protected]>>
                        | Cc: "Roman" <[email protected]
                        <mailto:[email protected]>>,
                        [email protected]
                        <mailto:[email protected]>, "Niels
                        de Vos" <[email protected]
                        <mailto:[email protected]>>
                        | Sent: Wednesday, August 27, 2014 12:34:22 PM
                        | Subject: Re: [Gluster-users] libgfapi
                        failover problem on replica bricks
                        |
                        |
                        | On 08/27/2014 12:24 PM, Roman wrote:
                        | > root@stor1:~# ls -l /usr/sbin/glfsheal
                        | > ls: cannot access /usr/sbin/glfsheal: No
                        such file or directory
                        | > Seems like not.
                        | Humble,
                        |       Seems like the binary is still not
                        packaged?

                        Checking with Kaleb on this.

                    ...

                        | >>>            |
                        | >>>            | Humble/Niels,
                        | >>>            |     Do we have debs
                        available for 3.5.2? In 3.5.1
                        | >>>  there was packaging
                        | >>>            | issue where
                        /usr/bin/glfsheal is not packaged along
                        | >>>  with the deb. I
                        | >>>            | think that should be
                        fixed now as well?
                        | >>>            |
                        | >>>  Pranith,
                        | >>>
                        | >>>            The 3.5.2 packages for
                        debian is not available yet. We
                        | >>>            are co-ordinating
                        internally to get it processed.
                        | >>>            I will update the list once
                        its available.
                        | >>>
                        | >>>  --Humble


                    glfsheal isn't in our 3.5.2-1 DPKGs either. We
                    (meaning I) started with the 3.5.1 packaging
                    bits from Semiosis. Perhaps he fixed 3.5.1 after
                    giving me his bits.

                    I'll fix it and spin 3.5.2-2 DPKGs.

                That is great Kaleb. Please notify semiosis as well
                in case he is yet to fix it.

                Pranith

--

                    Kaleb

--Best regards,

            Roman.

--Best regards,

        Roman.

--Best regards,

    Roman.





--
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

Reply via email to