FYI I logged this into git tracking https://github.com/corosync/corosync/issues/70
Yes, RHEL 6 with 2.3.4 compiled via source rpm. IIRC no spec changes were needed. "process" refered to "corosync" run without options. On Mon, May 11, 2015 at 9:54 AM, Jan Friesse <[email protected]> wrote: > Moving back to list. > > otheus uibk napsal(a): > >> Logs from the 4 days ago indicate the memory back then was stable... 329 >> MB >> and < 17 MB after a restart. One difference may be that on the 4th of May, >> no application was using Corosync. So today, after reverting to the >> previous configuration, on the same host, without any other configuration >> changes, the RES creeped up to 8.7G over the span of less than 6 hours. >> Strange. >> > > What application are you using? What is "process"? > > >> Sample logs. Again, Time / RSS / SZ parameters: >> >> 23:32:01 8292 21363 # after process started >> 23:36:01 121584 50821 >> 23:47:01 410976 127128 >> 23:57:01 599244 196763 >> 00:07:01 855768 266144 >> 00:17:02 1203376 335780 >> 00:27:01 1221156 405186 >> ... >> 01:27:01 2963996 821724 >> 02:27:01 4660864 1238277 >> 03:27:01 6495140 1653085 >> 04:27:01 6548324 2068631 >> 05:27:01 9068520 2477199 >> 06:07:01 9104724 2754738 >> >> >> On Thu, May 7, 2015 at 11:25 PM, otheus uibk <[email protected]> >> wrote: >> >> Reproducible. >>> >>> libqb 0.16.0, release 2.el6 on the systems with and without the memory >>> leak problem. >>> >> > So you are using RHEL 6 with your own corosync 2.3.4 package? > > > On the memory leaking systems, nspr is 4.10.2. On the non-memory leak, >>> 4.10.8. >>> However, 3 days earlier, before the configuration change, there was no >>> memory leak. >>> >>> I have a cronjob which captures the output of ps every minute. Here are >>> the "rss",and "sz" columns of corosync, with times. >>> >>> 17:53:01 3952 18271 >>> 17:54:02 18496 21889 >>> 17:58:01 70588 37823 >>> 18:00:01 113904 45790 >>> 18:06:01 207264 69177 >>> 18:10:01 245152 83857 >>> 18:15:01 338212 103154 >>> 18:20:01 402860 123200 >>> ... >>> 19:00:01 985844 282619 >>> 19:10:01 1058220 322711 >>> ... >>> 20:00:01 1960572 522400 >>> 20:30:01 2289444 642419 >>> >>> Current configuration (with IPs smudged): >>> >>> quroum { >>> provider: corosync_votequorum >>> expected_votes: 2 >>> } >>> aisexec { >>> user:root >>> group:root >>> } >>> #service { >>> # name: pacemaker >>> # ver: 1 >>> #} >>> totem { >>> version: 2 >>> >>> # if "on" must use shared "corosync-keygen". >>> secauth: off >>> threads: 2 >>> rrp_mode: none >>> transport: udpu >>> interface { >>> bindnetaddr: 172.24.0.0 >>> # Rings must be consecutively numbered, starting at 0. >>> ringnumber: 0 >>> mcastport: 5561 >>> } >>> } >>> >>> logging { >>> fileline: off >>> to_stderr: no >>> to_logfile: yes >>> logfile: /var/log/corosync.log >>> to_syslog: no >>> debug: off >>> timestamp: on >>> logger_subsys { >>> subsys: AMF >>> debug: on >>> } >>> } >>> >>> nodelist { >>> node { >>> ring0_addr: 138.x.x.x >>> } >>> node { >>> ring0_addr: 172.24.2.61 >>> } >>> node { >>> ring0_addr: 172.24.1.61 >>> } >>> node { >>> ring0_addr: 172.24.1.37 >>> } >>> node { >>> ring0_addr: 172.24.2.37 >>> } >>> } >>> >>> Here is a DIFF of the configuration before/after (again, IPs smudged) >>> >>> diff --git a/corosync/corosync.conf b/corosync/corosync.conf >>> index cc9c151..5e3e695 100644 >>> --- a/corosync/corosync.conf >>> +++ b/corosync/corosync.conf >>> @@ -23,15 +23,6 @@ totem { >>> # Rings must be consecutively numbered, starting at 0. >>> ringnumber: 0 >>> mcastport: 5561 >>> - member { >>> - memberaddr: 138.x.x.x >>> - } >>> - member { >>> - memberaddr: 172.24.1.61 >>> - } >>> - member { >>> - memberaddr: 172.24.2.61 >>> - } >>> } >>> } >>> >>> @@ -48,3 +39,21 @@ logging { >>> debug: on >>> } >>> } >>> + >>> +nodelist { >>> + node { >>> + ring0_addr: 138.x.x.x >>> + } >>> + node { >>> + ring0_addr: 172.24.2.61 >>> + } >>> + node { >>> + ring0_addr: 172.24.1.61 >>> + } >>> + node { >>> + ring0_addr: 172.24.1.37 >>> + } >>> + node { >>> + ring0_addr: 172.24.2.37 >>> + } >>> +} >>> >>> >>> On Thu, May 7, 2015 at 5:10 PM, Jan Friesse <[email protected]> wrote: >>> >>> Otheus, >>>> >>>> otheus uibk napsal(a): >>>> >>>> Here is a top-except from corosync 2.3.4 running for under 15 hours: >>>>> >>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>> 15406 root 20 0 10.5g 9.2g 9168 S 2.0 58.8 11:50.07 corosync >>>>> >>>>> (I'm using fixed-width font in gmail;I have no idea what happens to >>>>> this >>>>> text when going via pipermail) >>>>> >>>>> It's showing a RES usage of 9.2 GB. This high memory usage is after a >>>>> relatively minor configuration change -- moving the listed nodes from >>>>> totem.interface.member{ } to nodelist.node { }. Two nodes were also >>>>> added. >>>>> AFAIK this was the only change. >>>>> >>>>> >>>> This look like a serious issue. Are you able to reproduce it? What >>>> version of libqb are you using? >>>> >>>> Regards, >>>> Honza >>>> >>>> >>>> A review of changes since 2.3.4 indicates this has not been fixed since >>>>> that release. >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> discuss mailing list >>>>> [email protected] >>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>> >>>>> >>>>> >>>> >>> >>> -- >>> Otheus >>> [email protected] >>> [email protected] >>> >>> >>> >> >> > -- Otheus [email protected] [email protected]
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
