Re: [corosync] memory leak -- a bad one

otheus uibk Mon, 11 May 2015 01:30:12 -0700

FYI I logged this into git tracking
https://github.com/corosync/corosync/issues/70


Yes, RHEL 6 with 2.3.4 compiled via source rpm. IIRC no spec changes were
needed.

"process" refered to "corosync" run without options.

On Mon, May 11, 2015 at 9:54 AM, Jan Friesse <[email protected]> wrote:

> Moving back to list.
>
> otheus uibk napsal(a):
>
>> Logs from the 4 days ago indicate the memory back then was stable... 329
>> MB
>> and < 17 MB after a restart. One difference may be that on the 4th of May,
>> no application was using Corosync.  So today, after reverting to the
>> previous configuration, on the same host, without any other configuration
>> changes, the RES creeped up to 8.7G over the span of less than 6 hours.
>> Strange.
>>
>
> What application are you using? What is "process"?
>
>
>> Sample logs. Again, Time / RSS / SZ parameters:
>>
>> 23:32:01 8292 21363   # after process started
>> 23:36:01 121584 50821
>> 23:47:01 410976 127128
>> 23:57:01 599244 196763
>> 00:07:01 855768 266144
>> 00:17:02 1203376 335780
>> 00:27:01 1221156 405186
>>   ...
>> 01:27:01 2963996 821724
>> 02:27:01 4660864 1238277
>> 03:27:01 6495140 1653085
>> 04:27:01 6548324 2068631
>> 05:27:01 9068520 2477199
>> 06:07:01 9104724 2754738
>>
>>
>> On Thu, May 7, 2015 at 11:25 PM, otheus uibk <[email protected]>
>> wrote:
>>
>>  Reproducible.
>>>
>>> libqb 0.16.0, release 2.el6 on the systems with and without the memory
>>> leak problem.
>>>
>>
> So you are using RHEL 6 with your own corosync 2.3.4 package?
>
>
>  On the memory leaking systems, nspr is 4.10.2. On the non-memory leak,
>>> 4.10.8.
>>> However, 3 days earlier, before the configuration change, there was no
>>> memory leak.
>>>
>>> I have a cronjob which captures the output of ps every minute.  Here are
>>> the "rss",and "sz" columns of corosync, with times.
>>>
>>> 17:53:01 3952 18271
>>> 17:54:02 18496 21889
>>> 17:58:01 70588 37823
>>> 18:00:01 113904 45790
>>> 18:06:01 207264 69177
>>> 18:10:01 245152 83857
>>> 18:15:01 338212 103154
>>> 18:20:01 402860 123200
>>> ...
>>> 19:00:01 985844 282619
>>> 19:10:01 1058220 322711
>>> ...
>>> 20:00:01 1960572 522400
>>> 20:30:01 2289444 642419
>>>
>>> Current configuration (with IPs smudged):
>>>
>>> quroum {
>>>          provider: corosync_votequorum
>>>          expected_votes: 2
>>> }
>>> aisexec {
>>>          user:root
>>>          group:root
>>> }
>>> #service {
>>> #       name: pacemaker
>>> #       ver: 1
>>> #}
>>> totem {
>>>          version: 2
>>>
>>>          # if "on" must use shared "corosync-keygen".
>>>          secauth: off
>>>          threads: 2
>>>          rrp_mode: none
>>>          transport: udpu
>>>          interface {
>>>                  bindnetaddr: 172.24.0.0
>>>                  # Rings must be consecutively numbered, starting at 0.
>>>                  ringnumber: 0
>>>                  mcastport: 5561
>>>          }
>>> }
>>>
>>> logging {
>>>          fileline: off
>>>          to_stderr: no
>>>          to_logfile: yes
>>>          logfile: /var/log/corosync.log
>>>          to_syslog: no
>>>          debug: off
>>>          timestamp: on
>>>          logger_subsys {
>>>                  subsys: AMF
>>>                  debug: on
>>>          }
>>> }
>>>
>>> nodelist {
>>>          node {
>>>                  ring0_addr: 138.x.x.x
>>>          }
>>>          node {
>>>                  ring0_addr: 172.24.2.61
>>>          }
>>>          node {
>>>                  ring0_addr: 172.24.1.61
>>>          }
>>>          node {
>>>                  ring0_addr: 172.24.1.37
>>>          }
>>>          node {
>>>                  ring0_addr: 172.24.2.37
>>>          }
>>> }
>>>
>>> Here is a DIFF of the configuration before/after (again, IPs smudged)
>>>
>>> diff --git a/corosync/corosync.conf b/corosync/corosync.conf
>>> index cc9c151..5e3e695 100644
>>> --- a/corosync/corosync.conf
>>> +++ b/corosync/corosync.conf
>>> @@ -23,15 +23,6 @@ totem {
>>>                   # Rings must be consecutively numbered, starting at 0.
>>>                   ringnumber: 0
>>>                  mcastport: 5561
>>> -               member {
>>> -                       memberaddr: 138.x.x.x
>>> -               }
>>> -               member {
>>> -                       memberaddr: 172.24.1.61
>>> -               }
>>> -               member {
>>> -                       memberaddr: 172.24.2.61
>>> -               }
>>>           }
>>>   }
>>>
>>> @@ -48,3 +39,21 @@ logging {
>>>                   debug: on
>>>           }
>>>   }
>>> +
>>> +nodelist {
>>> +       node {
>>> +               ring0_addr: 138.x.x.x
>>> +       }
>>> +       node {
>>> +               ring0_addr: 172.24.2.61
>>> +       }
>>> +       node {
>>> +               ring0_addr: 172.24.1.61
>>> +       }
>>> +       node {
>>> +               ring0_addr: 172.24.1.37
>>> +       }
>>> +       node {
>>> +               ring0_addr: 172.24.2.37
>>> +       }
>>> +}
>>>
>>>
>>> On Thu, May 7, 2015 at 5:10 PM, Jan Friesse <[email protected]> wrote:
>>>
>>>  Otheus,
>>>>
>>>> otheus uibk napsal(a):
>>>>
>>>>  Here is a top-except from corosync 2.3.4 running for under 15 hours:
>>>>>
>>>>>     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>> 15406 root      20   0 10.5g 9.2g 9168 S  2.0 58.8  11:50.07 corosync
>>>>>
>>>>> (I'm using fixed-width font in gmail;I have no idea what happens to
>>>>> this
>>>>> text when going via pipermail)
>>>>>
>>>>> It's showing a RES usage of 9.2 GB. This high memory usage is after a
>>>>> relatively minor configuration change -- moving the listed nodes from
>>>>> totem.interface.member{ } to  nodelist.node { }. Two nodes were also
>>>>> added.
>>>>> AFAIK this was the only change.
>>>>>
>>>>>
>>>> This look like a serious issue. Are you able to reproduce it? What
>>>> version of libqb are you using?
>>>>
>>>> Regards,
>>>>    Honza
>>>>
>>>>
>>>>  A review of changes since 2.3.4 indicates this has not been fixed since
>>>>> that release.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list
>>>>> [email protected]
>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Otheus
>>> [email protected]
>>> [email protected]
>>>
>>>
>>>
>>
>>
>


-- 
Otheus
[email protected]
[email protected]

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Re: [corosync] memory leak -- a bad one

Reply via email to