Hi Mark, Sorry, I was busy and could not take a serious look at the logs. I can update you on Monday.
Thanks, Kotresh HR On Wed, Jun 20, 2018 at 12:32 PM, Mark Betham < [email protected]> wrote: > Hi Kotresh, > > I was wondering if you had made any progress with regards to the issue I > am currently experiencing with geo-replication. > > For info the fault remains and effectively requires a restart of the > geo-replication service on a daily basis to reclaim the used memory on the > slave node. > > If you require any further information then please do not hesitate to ask. > > Many thanks, > > Mark Betham > > > On Mon, 11 Jun 2018 at 08:24, Mark Betham <mark.betham@ > performancehorizon.com> wrote: > >> Hi Kotresh, >> >> Many thanks. I will shortly setup a share on my GDrive and send the link >> directly to yourself. >> >> For Info; >> The Geo-Rep slave failed again over the weekend but it did not recover >> this time. It looks to have become unresponsive at around 14:40 UTC on 9th >> June. I have attached an image showing the mem usage and you can see from >> this when the system failed. The system was totally unresponsive and >> required a cold power off and then power on in order to recover the server. >> >> Many thanks for your help. >> >> Mark Betham. >> >> On 11 June 2018 at 05:53, Kotresh Hiremath Ravishankar < >> [email protected]> wrote: >> >>> Hi Mark, >>> >>> Google drive works for me. >>> >>> Thanks, >>> Kotresh HR >>> >>> On Fri, Jun 8, 2018 at 3:00 PM, Mark Betham <mark.betham@ >>> performancehorizon.com> wrote: >>> >>>> Hi Kotresh, >>>> >>>> The memory issue re-occurred again. This is indicating it will occur >>>> around once a day. >>>> >>>> Again no traceback listed in the log, the only update in the log was as >>>> follows; >>>> [2018-06-08 08:26:43.404261] I [resource(slave):1020:service_loop] >>>> GLUSTER: connection inactive, stopping timeout=120 >>>> [2018-06-08 08:29:19.357615] I [syncdutils(slave):271:finalize] <top>: >>>> exiting. >>>> [2018-06-08 08:31:02.432002] I [resource(slave):1502:connect] GLUSTER: >>>> Mounting gluster volume locally... >>>> [2018-06-08 08:31:03.716967] I [resource(slave):1515:connect] GLUSTER: >>>> Mounted gluster volume duration=1.2729 >>>> [2018-06-08 08:31:03.717411] I [resource(slave):1012:service_loop] >>>> GLUSTER: slave listening >>>> >>>> I have attached an image showing the latest memory usage pattern. >>>> >>>> Can you please advise how I can pass the log data across to you? As >>>> soon as I know this I will get the data uploaded for your review. >>>> >>>> Thanks, >>>> >>>> Mark Betham >>>> >>>> >>>> >>>> >>>> On 7 June 2018 at 08:19, Mark Betham <mark.betham@ >>>> performancehorizon.com> wrote: >>>> >>>>> Hi Kotresh, >>>>> >>>>> Many thanks for your prompt response. >>>>> >>>>> Below are my responses to your questions; >>>>> >>>>> 1. Is this trace back consistently hit? I just wanted to confirm >>>>> whether it's transient which occurs once in a while and gets back to >>>>> normal? >>>>> It appears not. As soon as the geo-rep recovered yesterday from the >>>>> high memory usage it immediately began rising again until it consumed all >>>>> of the available ram. But this time nothing was committed to the log >>>>> file. >>>>> I would like to add here that this current instance of geo-rep was >>>>> only brought online at the start of this week due to the issues with glibc >>>>> on CentOS 7.5. This is the first time I have had geo-rep running with >>>>> Gluster ver 3.12.9, both storage clusters at each physical site were only >>>>> rebuilt approx. 4 weeks ago, due to the previous version in use going EOL. >>>>> Prior to this I had been running 3.13.2 (3.13.X now EOL) at each of the >>>>> sites and it is worth noting that the same behaviour was also seen on this >>>>> version of Gluster, unfortunately I do not have any of the log data from >>>>> then but I do not recall seeing any instances of the trace back message >>>>> mentioned. >>>>> >>>>> 2. Please upload the complete geo-rep logs from both master and slave. >>>>> I have the log files, just checking to make sure there is no >>>>> confidential info inside. The logfiles are too big to send via email, >>>>> even >>>>> when compressed. Do you have a preferred method to allow me to share this >>>>> data with you or would a share from my Google drive be sufficient? >>>>> >>>>> 3. Are the gluster versions same across master and slave? >>>>> Yes, all gluster versions are the same across the two sites for all >>>>> storage nodes. See below for version info taken from the current geo-rep >>>>> master. >>>>> >>>>> glusterfs 3.12.9 >>>>> Repository revision: git://git.gluster.org/glusterfs.git >>>>> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> >>>>> GlusterFS comes with ABSOLUTELY NO WARRANTY. >>>>> It is licensed to you under your choice of the GNU Lesser >>>>> General Public License, version 3 or any later version (LGPLv3 >>>>> or later), or the GNU General Public License, version 2 (GPLv2), >>>>> in all cases as published by the Free Software Foundation. >>>>> >>>>> glusterfs-geo-replication-3.12.9-1.el7.x86_64 >>>>> glusterfs-gnfs-3.12.9-1.el7.x86_64 >>>>> glusterfs-libs-3.12.9-1.el7.x86_64 >>>>> glusterfs-server-3.12.9-1.el7.x86_64 >>>>> glusterfs-3.12.9-1.el7.x86_64 >>>>> glusterfs-api-3.12.9-1.el7.x86_64 >>>>> glusterfs-events-3.12.9-1.el7.x86_64 >>>>> centos-release-gluster312-1.0-1.el7.centos.noarch >>>>> glusterfs-client-xlators-3.12.9-1.el7.x86_64 >>>>> glusterfs-cli-3.12.9-1.el7.x86_64 >>>>> python2-gluster-3.12.9-1.el7.x86_64 >>>>> glusterfs-rdma-3.12.9-1.el7.x86_64 >>>>> glusterfs-fuse-3.12.9-1.el7.x86_64 >>>>> >>>>> I have also attached another screenshot showing the memory usage from >>>>> the Gluster slave for the last 48 hours. This shows memory saturation >>>>> from >>>>> yesterday, which correlates with the trace back sent yesterday, and the >>>>> subsequent memory saturation which occurred over the last 24 hours. For >>>>> info, all times are in UTC. >>>>> >>>>> Please advise the preferred method to get the log data across to you >>>>> and also if you require any further information. >>>>> >>>>> Many thanks, >>>>> >>>>> Mark Betham >>>>> >>>>> >>>>> On 7 June 2018 at 04:42, Kotresh Hiremath Ravishankar < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> Few questions. >>>>>> >>>>>> 1. Is this trace back consistently hit? I just wanted to confirm >>>>>> whether it's transient which occurs once in a while and gets back to >>>>>> normal? >>>>>> 2. Please upload the complete geo-rep logs from both master and slave. >>>>>> 3. Are the gluster versions same across master and slave? >>>>>> >>>>>> Thanks, >>>>>> Kotresh HR >>>>>> >>>>>> On Wed, Jun 6, 2018 at 7:10 PM, Mark Betham <mark.betham@ >>>>>> performancehorizon.com> wrote: >>>>>> >>>>>>> Dear Gluster-Users, >>>>>>> >>>>>>> I have geo-replication setup and configured between 2 Gluster pools >>>>>>> located at different sites. What I am seeing is an error being reported >>>>>>> within the geo-replication slave log as follows; >>>>>>> >>>>>>> *[2018-06-05 12:05:26.767615] E >>>>>>> [syncdutils(slave):331:log_raise_exception] <top>: FAIL: * >>>>>>> *Traceback (most recent call last):* >>>>>>> * File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", >>>>>>> line 361, in twrap* >>>>>>> * tf(*aa)* >>>>>>> * File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line >>>>>>> 1009, in <lambda>* >>>>>>> * t = syncdutils.Thread(target=lambda: (repce.service_loop(),* >>>>>>> * File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>>>>>> 90, in service_loop* >>>>>>> * self.q.put(recv(self.inf))* >>>>>>> * File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>>>>>> 61, in recv* >>>>>>> * return pickle.load(inf)* >>>>>>> *ImportError: No module named >>>>>>> h_2013-04-26-04:02:49-2013-04-26_11:02:53.gz.15WBuUh* >>>>>>> *[2018-06-05 12:05:26.768085] E [repce(slave):117:worker] <top>: >>>>>>> call failed: * >>>>>>> *Traceback (most recent call last):* >>>>>>> * File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line >>>>>>> 113, in worker* >>>>>>> * res = getattr(self.obj, rmeth)(*in_data[2:])* >>>>>>> *TypeError: getattr(): attribute name must be string* >>>>>>> >>>>>>> From this point in time the slave server begins to consume all of >>>>>>> its available RAM until it becomes non-responsive. Eventually the >>>>>>> gluster >>>>>>> service seems to kill off the offending process and the memory is >>>>>>> returned >>>>>>> to the system. Once the memory has been returned to the remote slave >>>>>>> system the geo-replication often recovers and data transfer resumes. >>>>>>> >>>>>>> I have attached the full geo-replication slave log containing the >>>>>>> error shown above. I have also attached an image file showing the >>>>>>> memory >>>>>>> usage of the affected storage server. >>>>>>> >>>>>>> We are currently running Gluster version 3.12.9 on top of CentOS 7.5 >>>>>>> x86_64. The system has been fully patched and is running the latest >>>>>>> software, excluding glibc which had to be downgraded to get >>>>>>> geo-replication >>>>>>> working. >>>>>>> >>>>>>> The Gluster volume runs on a dedicated partition using the XFS >>>>>>> filesystem which in turn is running on a LVM thin volume. The physical >>>>>>> storage is presented as a single drive due to the underlying disks being >>>>>>> part of a raid 10 array. >>>>>>> >>>>>>> The Master volume which is being replicated has a total of 2.2 TB of >>>>>>> data to be replicated. The total size of the volume fluctuates very >>>>>>> little >>>>>>> as data being removed equals the new data coming in. This data is made >>>>>>> up >>>>>>> of many thousands of files across many separated directories. Data file >>>>>>> sizes vary from the very small (>1K) to the large (>1Gb). The Gluster >>>>>>> service itself is running with a single volume in a replicated >>>>>>> configuration across 3 bricks at each of the sites. The delta changes >>>>>>> being replicated are on average about 100GB per day, where this includes >>>>>>> file creation / deletion / modification. >>>>>>> >>>>>>> The config for the geo-replication session is as follows, taken from >>>>>>> the current source server; >>>>>>> >>>>>>> *special_sync_mode: partial* >>>>>>> *gluster_log_file: >>>>>>> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.gluster.log* >>>>>>> *ssh_command: ssh -oPasswordAuthentication=no >>>>>>> -oStrictHostKeyChecking=no -i >>>>>>> /var/lib/glusterd/geo-replication/secret.pem* >>>>>>> *change_detector: changelog* >>>>>>> *session_owner: 40e9e77a-034c-44a2-896e-59eec47e8a84* >>>>>>> *state_file: >>>>>>> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.status* >>>>>>> *gluster_params: aux-gfid-mount acl* >>>>>>> *log_rsync_performance: true* >>>>>>> *remote_gsyncd: /nonexistent/gsyncd* >>>>>>> *working_dir: >>>>>>> /var/lib/misc/glusterfsd/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1* >>>>>>> *state_detail_file: >>>>>>> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-detail.status* >>>>>>> *gluster_command_dir: /usr/sbin/* >>>>>>> *pid_file: >>>>>>> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.pid* >>>>>>> *georep_session_working_dir: >>>>>>> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/* >>>>>>> *ssh_command_tar: ssh -oPasswordAuthentication=no >>>>>>> -oStrictHostKeyChecking=no -i >>>>>>> /var/lib/glusterd/geo-replication/tar_ssh.pem* >>>>>>> *master.stime_xattr_name: >>>>>>> trusted.glusterfs.40e9e77a-034c-44a2-896e-59eec47e8a84.ccfaed9b-ff4b-4a55-acfa-03f092cdf460.stime* >>>>>>> *changelog_log_file: >>>>>>> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-changes.log* >>>>>>> *socketdir: /var/run/gluster* >>>>>>> *volume_id: 40e9e77a-034c-44a2-896e-59eec47e8a84* >>>>>>> *ignore_deletes: false* >>>>>>> *state_socket_unencoded: >>>>>>> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.socket* >>>>>>> *log_file: >>>>>>> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.log* >>>>>>> >>>>>>> If any further information is required in order to troubleshoot this >>>>>>> issue then please let me know. >>>>>>> >>>>>>> I would be very grateful for any help or guidance received. >>>>>>> >>>>>>> Many thanks, >>>>>>> >>>>>>> Mark Betham. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> This email may contain confidential material; unintended recipients >>>>>>> must not disseminate, use, or act upon any information in it. If you >>>>>>> received this email in error, please contact the sender and permanently >>>>>>> delete the email. >>>>>>> Performance Horizon Group Limited | Registered in England & Wales >>>>>>> 07188234 | Level 8, West One, Forth Banks, Newcastle upon Tyne, NE1 3PA >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> MARK BETHAM >>>>> Senior System Administrator >>>>> +44 (0) 191 261 2444 >>>>> performancehorizon.com >>>>> PerformanceHorizon <https://www.facebook.com/PerformanceHorizon> >>>>> tweetphg <https://twitter.com/tweetphg> >>>>> performance-horizon-group >>>>> <https://www.linkedin.com/company-beta/1484320/> >>>>> >>>> >>>> >>>> >>>> -- >>>> MARK BETHAM >>>> Senior System Administrator >>>> +44 (0) 191 261 2444 >>>> performancehorizon.com >>>> PerformanceHorizon <https://www.facebook.com/PerformanceHorizon> >>>> tweetphg <https://twitter.com/tweetphg> >>>> performance-horizon-group >>>> <https://www.linkedin.com/company-beta/1484320/> >>>> >>>> >>>> This email may contain confidential material; unintended recipients >>>> must not disseminate, use, or act upon any information in it. If you >>>> received this email in error, please contact the sender and permanently >>>> delete the email. >>>> Performance Horizon Group Limited | Registered in England & Wales >>>> 07188234 | Level 8, West One, Forth Banks, Newcastle upon Tyne, NE1 3PA >>>> >>>> >>>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> >> >> >> -- >> MARK BETHAM >> Senior System Administrator >> +44 (0) 191 261 2444 >> performancehorizon.com >> PerformanceHorizon <https://www.facebook.com/PerformanceHorizon> >> tweetphg <https://twitter.com/tweetphg> >> performance-horizon-group >> <https://www.linkedin.com/company-beta/1484320/> >> > > > -- > MARK BETHAM > Senior System Administrator > +44 (0) 191 261 2444 > performancehorizon.com > PerformanceHorizon <https://www.facebook.com/PerformanceHorizon> > tweetphg <https://twitter.com/tweetphg> > performance-horizon-group <https://www.linkedin.com/company-beta/1484320/> > > > This email may contain confidential material; unintended recipients must > not disseminate, use, or act upon any information in it. If you received > this email in error, please contact the sender and permanently delete the > email. > Performance Horizon Group Limited | Registered in England & Wales 07188234 > | Level 8, West One, Forth Banks, Newcastle upon Tyne, NE1 3PA > > > -- Thanks and Regards, Kotresh H R
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
