well, yeah. it continues to write to the log.1 file after rotation root@stor1:~# ls -lo /proc/9162/fd/ total 0 lr-x------ 1 root 64 Sep 2 11:23 0 -> /dev/null l-wx------ 1 root 64 Sep 2 11:23 1 -> /dev/null lrwx------ 1 root 64 Sep 2 11:23 10 -> socket:[1419369] lrwx------ 1 root 64 Sep 2 11:23 11 -> socket:[1545340] lrwx------ 1 root 64 Sep 2 11:23 12 -> socket:[1545680] lrwx------ 1 root 64 Sep 2 11:23 13 -> socket:[1545349] lrwx------ 1 root 64 Sep 2 11:23 14 -> socket:[1545351] lrwx------ 1 root 64 Sep 2 11:23 15 -> socket:[1129990] lrwx------ 1 root 64 Sep 2 11:23 16 -> socket:[1545681] lrwx------ 1 root 64 Sep 2 11:23 17 -> socket:[1545692] lrwx------ 1 root 64 Sep 2 11:23 18 -> socket:[1545850] lrwx------ 1 root 64 Sep 2 11:23 19 -> socket:[1545716] l-wx------ 1 root 64 Sep 2 11:23 2 -> /dev/null lrwx------ 1 root 64 Sep 2 11:23 20 -> socket:[1129991] lrwx------ 1 root 64 Sep 2 11:23 21 -> socket:[1419031] lrwx------ 1 root 64 Sep 3 10:03 22 -> socket:[1545727] lrwx------ 1 root 64 Sep 2 11:23 3 -> anon_inode:[eventpoll] l-wx------ 1 root 64 Sep 2 11:23 4 -> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log.1 lrwx------ 1 root 64 Sep 2 11:23 5 -> /run/glusterd.pid lrwx------ 1 root 64 Sep 2 11:23 6 -> socket:[1473128] lrwx------ 1 root 64 Sep 2 11:23 7 -> socket:[1472592] l-wx------ 1 root 64 Sep 2 11:23 8 -> /var/log/glusterfs/.cmd_log_history lrwx------ 1 root 64 Sep 2 11:23 9 -> socket:[1419368] root@stor1:~#
2014-09-02 18:01 GMT+03:00 Roman <[email protected]>: > Same here. > But it just never started to heal nor sync nor nothing, when I've wrote > first message :) > Now it runs very smoothly, except logging, which I will check tomorrow. > Thanks for feedback though ! > > > 2014-09-02 17:20 GMT+03:00 Peter Linder <[email protected]>: > > In my setup, proxmox does have a glusterfs mount but it is for >> management purposes only, ie creating images and such. The real business is >> done with libgfapi, which means that the kvm process itself is the gluster >> client. It will most certainly trigger a self-heal in itself so the self >> heal daemon wont pick it up, and it doesn't have anywhere to log that I >> know of. >> >> That being said, glusterfs has always recovered nicely whenever I have >> lost and recovered a server, but the healing seems to need an hour or so >> based on cpu and network usage graphs.... >> >> >> >> >> On 9/1/2014 9:26 AM, Roman wrote: >> >> Hmm, I don't know how, but both VM-s survived the second server outage :) >> Still had no any message about healing completion anywhere :) >> >> >> 2014-09-01 10:13 GMT+03:00 Roman <[email protected]>: >> >>> The mount is on the proxmox machine. >>> >>> here are the logs from disconnection till connection: >>> >>> >>> [2014-09-01 06:19:38.059383] W [socket.c:522:__socket_rwv] 0-glusterfs: >>> readv on 10.250.0.1:24007 failed (Connection timed out) >>> [2014-09-01 06:19:40.338393] W [socket.c:522:__socket_rwv] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: readv on 10.250.0.1:49159 failed >>> (Connection timed out) >>> [2014-09-01 06:19:40.338447] I [client.c:2229:client_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: disconnected from 10.250.0.1:49159. >>> Client process will keep trying to connect to glusterd until brick's port >>> is available >>> [2014-09-01 06:19:49.196768] E [socket.c:2161:socket_connect_finish] >>> 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host) >>> [2014-09-01 06:20:05.565444] E [socket.c:2161:socket_connect_finish] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: connection to 10.250.0.1:24007 >>> failed (No route to host) >>> [2014-09-01 06:23:26.607180] I [rpc-clnt.c:1729:rpc_clnt_reconfig] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0) >>> [2014-09-01 06:23:26.608032] I >>> [client-handshake.c:1677:select_server_supported_programs] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num >>> (1298437), Version (330) >>> [2014-09-01 06:23:26.608395] I >>> [client-handshake.c:1462:client_setvolume_cbk] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159, >>> attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'. >>> [2014-09-01 06:23:26.608420] I >>> [client-handshake.c:1474:client_setvolume_cbk] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers >>> are not same, reopening the fds >>> [2014-09-01 06:23:26.608606] I >>> [client-handshake.c:450:client_set_lk_version_cbk] >>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1 >>> [2014-09-01 06:23:40.604979] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] >>> 0-glusterfs: No change in volfile, continuing >>> >>> Now there is no healing traffic also. I could try to disconnect now >>> second server to see if it is going to failover. I don't really believe it >>> will :( >>> >>> here are some logs for stor1 server (the one I've disconnected): >>> root@stor1:~# cat >>> /var/log/glusterfs/bricks/exports-HA-2TB-TT-Proxmox-cluster-2TB.log >>> [2014-09-01 06:19:26.403323] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:26.403399] I [server-helpers.c:289:do_fd_cleanup] >>> 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on >>> /images/112/vm-112-disk-1.raw >>> [2014-09-01 06:19:26.403486] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:29.475318] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:29.475373] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:36.963318] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:36.963373] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:40.419298] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:40.419355] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:42.531310] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:19:42.531368] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:23:25.088518] I >>> [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:25.532734] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:26.608074] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:27.187556] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:27.213890] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:31.222654] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-1 >>> (version: 3.5.2) >>> [2014-09-01 06:23:52.591365] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:23:52.591447] W [inodelk.c:392:pl_inodelk_log_cleanup] >>> 0-HA-2TB-TT-Proxmox-cluster-server: releasing lock on >>> 14f70955-5e1e-4499-b66b-52cd50892315 held by {client=0x7f2494001ed0, pid=0 >>> lk-owner=bc3ddbdbae7f0000} >>> [2014-09-01 06:23:52.591568] I [server-helpers.c:289:do_fd_cleanup] >>> 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on >>> /images/124/vm-124-disk-1.qcow2 >>> [2014-09-01 06:23:52.591679] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:23:58.709444] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> (version: 3.5.2) >>> [2014-09-01 06:24:00.741542] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:24:00.741598] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:30:06.010819] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> (version: 3.5.2) >>> [2014-09-01 06:30:08.056059] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:30:08.056127] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:36:54.307743] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> (version: 3.5.2) >>> [2014-09-01 06:36:56.340078] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:36:56.340122] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:46:53.601517] I [server-handshake.c:575:server_setvolume] >>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from >>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> (version: 3.5.2) >>> [2014-09-01 06:46:55.624705] I [server.c:520:server_rpc_notify] >>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom >>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> [2014-09-01 06:46:55.624793] I [client_t.c:417:gf_client_unref] >>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection >>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0 >>> >>> last 2 lines are pretty unclear. Why it has disconnected? >>> >>> >>> >>> >>> 2014-09-01 9:41 GMT+03:00 Pranith Kumar Karampuri <[email protected]>: >>> >>> >>> >>>> On 09/01/2014 12:08 PM, Roman wrote: >>>> >>>> Well, as for me, VM-s are not very impacted by healing process. At >>>> least the munin server running with pretty high load (average rarely goes >>>> below 0,9 :) )had no problems. To create some more load I've made a copy of >>>> 590 MB file on the VM-s disk, It took 22 seconds. Which is ca 27 MB /sec or >>>> 214 Mbps/sec >>>> >>>> Servers are connected via 10 gbit network. Proxmox client is >>>> connected to the server with separate 1 gbps interface. We are thinking of >>>> moving it to 10gbps also. >>>> >>>> Here are some heal info which is pretty confusing. >>>> >>>> right after 1st server restored it connection, it was pretty clear: >>>> >>>> root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info >>>> Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal >>>> Number of entries: 1 >>>> >>>> Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal >>>> /images/112/vm-112-disk-1.raw - Possibly undergoing heal >>>> Number of entries: 2 >>>> >>>> >>>> some time later is says >>>> root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info >>>> Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> Number of entries: 0 >>>> >>>> Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> Number of entries: 0 >>>> >>>> while I can still see traffic between servers and still there was no >>>> messages about healing process completion. >>>> >>>> On which machine do we have the mount? >>>> >>>> Pranith >>>> >>>> >>>> >>>> >>>> 2014-08-29 10:02 GMT+03:00 Pranith Kumar Karampuri <[email protected] >>>> >: >>>> >>>>> Wow, this is great news! Thanks a lot for sharing the results :-). >>>>> Did you get a chance to test the performance of the applications in the vm >>>>> during self-heal? >>>>> May I know more about your use case? i.e. How many vms and what is the >>>>> avg size of each vm etc? >>>>> >>>>> Pranith >>>>> >>>>> >>>>> On 08/28/2014 11:27 PM, Roman wrote: >>>>> >>>>> Here are the results. >>>>> 1. still have problem with logs rotation. logs are being written to >>>>> .log.1 file, not .log file. any hints, how to fix? >>>>> 2. healing logs are now much more better, I can see the successful >>>>> message. >>>>> 3. both volumes with HD off and on successfully synced. the volume >>>>> with HD on synced much more faster. >>>>> 4. both VMs on volumes survived the outage, when new files were >>>>> added and deleted during outage. >>>>> >>>>> So replication works well with both HD on and off for volumes for >>>>> VM-s. With HD even faster. Need to solve the logging issue. >>>>> >>>>> Seems we could start production storage from this moment :) The >>>>> whole company will use it. Some distributed and some replicated. Thanks >>>>> for >>>>> great product. >>>>> >>>>> >>>>> 2014-08-27 16:03 GMT+03:00 Roman <[email protected]>: >>>>> >>>>>> Installed new packages. Will make some tests tomorrow. thanx. >>>>>> >>>>>> >>>>>> 2014-08-27 14:10 GMT+03:00 Pranith Kumar Karampuri < >>>>>> [email protected]>: >>>>>> >>>>>> >>>>>>> On 08/27/2014 04:38 PM, Kaleb KEITHLEY wrote: >>>>>>> >>>>>>>> On 08/27/2014 03:09 AM, Humble Chirammal wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>> | From: "Pranith Kumar Karampuri" <[email protected]> >>>>>>>>> | To: "Humble Chirammal" <[email protected]> >>>>>>>>> | Cc: "Roman" <[email protected]>, [email protected], >>>>>>>>> "Niels de Vos" <[email protected]> >>>>>>>>> | Sent: Wednesday, August 27, 2014 12:34:22 PM >>>>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on >>>>>>>>> replica bricks >>>>>>>>> | >>>>>>>>> | >>>>>>>>> | On 08/27/2014 12:24 PM, Roman wrote: >>>>>>>>> | > root@stor1:~# ls -l /usr/sbin/glfsheal >>>>>>>>> | > ls: cannot access /usr/sbin/glfsheal: No such file or directory >>>>>>>>> | > Seems like not. >>>>>>>>> | Humble, >>>>>>>>> | Seems like the binary is still not packaged? >>>>>>>>> >>>>>>>>> Checking with Kaleb on this. >>>>>>>>> >>>>>>>>> ... >>>>>>>> >>>>>>>>> | >>> | >>>>>>>>> | >>> | Humble/Niels, >>>>>>>>> | >>> | Do we have debs available for 3.5.2? In >>>>>>>>> 3.5.1 >>>>>>>>> | >>> there was packaging >>>>>>>>> | >>> | issue where /usr/bin/glfsheal is not packaged >>>>>>>>> along >>>>>>>>> | >>> with the deb. I >>>>>>>>> | >>> | think that should be fixed now as well? >>>>>>>>> | >>> | >>>>>>>>> | >>> Pranith, >>>>>>>>> | >>> >>>>>>>>> | >>> The 3.5.2 packages for debian is not available >>>>>>>>> yet. We >>>>>>>>> | >>> are co-ordinating internally to get it processed. >>>>>>>>> | >>> I will update the list once its available. >>>>>>>>> | >>> >>>>>>>>> | >>> --Humble >>>>>>>>> >>>>>>>> >>>>>>>> glfsheal isn't in our 3.5.2-1 DPKGs either. We (meaning I) started >>>>>>>> with the 3.5.1 packaging bits from Semiosis. Perhaps he fixed 3.5.1 >>>>>>>> after >>>>>>>> giving me his bits. >>>>>>>> >>>>>>>> I'll fix it and spin 3.5.2-2 DPKGs. >>>>>>>> >>>>>>> That is great Kaleb. Please notify semiosis as well in case he is >>>>>>> yet to fix it. >>>>>>> >>>>>>> Pranith >>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Kaleb >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Roman. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Roman. >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Roman. >>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Roman. >>> >> >> >> >> -- >> Best regards, >> Roman. >> >> >> _______________________________________________ >> Gluster-users mailing [email protected] >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Best regards, > Roman. > -- Best regards, Roman.
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
