Re: [OmniOS-discuss] ILB memory leak?

2015-11-10 Thread Dan McDonald

> On Nov 10, 2015, at 2:50 AM, Al Slater  wrote:
> 
> On 10/11/2015 07:40, Al Slater wrote:
>> It seems to me that ilbd_run_probe just needs to call
>> posix_spawn_file_actions_destroy appropriately.
> 
> And probably posix_spawnattr_destroy as well?

Wow!  Great catch.  I'll bet a small sum you nailed this to the wall.

Want me to build you a replacement ilbd?

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-10 Thread Al Slater
On 10/11/15 15:26, Dan McDonald wrote:
> 
>> On Nov 10, 2015, at 2:50 AM, Al Slater  wrote:
>>
>> On 10/11/2015 07:40, Al Slater wrote:
>>> It seems to me that ilbd_run_probe just needs to call
>>> posix_spawn_file_actions_destroy appropriately.
>>
>> And probably posix_spawnattr_destroy as well?
> 
> Wow!  Great catch.  I'll bet a small sum you nailed this to the wall.
> 
> Want me to build you a replacement ilbd?

Yes please :)

Thanks for your, and Bob's, help with this.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-09 Thread Al Slater

Hi Dan,

On 06/11/2015 18:31, Dan McDonald wrote:

You said you had a test box, right?


Yes.


Can you:

- Disable UMEM_DEBUG
- RESTART the service.
- IMMEDIATELY after restart do pmap, and do pmap once per (sec, 10 sec, 
something) to see how it grows?


Attached is a compressed file with 5hrs or so of 10s pmaps.  Hopefully 
not too big for the list.



After that, maybe we can dtrace and see what's going on.




--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55


pmap.6589.gz
Description: application/gzip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-09 Thread Al Slater
On 09/11/15 15:43, Dan McDonald wrote:
> 
>> On Nov 9, 2015, at 8:39 AM, Al Slater  wrote:
>> 
>> Attached is a compressed file with 5hrs or so of 10s pmaps.
>> Hopefully not too big for the list.
> 
> It compressed nicely.  I'm noticing a pattern:
> 
> Mon Nov  9 08:21:45 UTC 2015 total Kb  134008  133504  131416
> - Mon Nov  9 08:50:21 UTC 2015 total Kb  265080  264576  262488
> - Mon Nov  9 09:37:42 UTC 2015 total Kb  265088  264580  262492
> - Mon Nov  9 09:47:40 UTC 2015 total Kb  527232  526724  524636
> - Mon Nov  9 11:42:19 UTC 2015 total Kb 1051520 1050960 1048872
> - Mon Nov  9 11:42:29 UTC 2015 total Kb 1051520 1051012 1048924
> -
> 
> 
> It's mostly linear growth.  Notice the time intervals also double
> whenever the footprint essentially doubles?
> 
> So I need to back up and ask some things, especially given libumem
> doesn't appear to show leaks or even usage:
> 
> 1.) Is the eating of memory affecting your system peformance?  (If
> you've only 8GB, yeah, I can see that.)

Hmmm...  I started investigating after the servers hung a couple of
times.  I have not conclusively proved that this was the cause, but the
machines have been running for months with no issue after I added a
cronjob to restart ilb twice a day.  I can see a gradual increase in
kernel memory use as well, but I have not investigated that.

> 2.) Is ilb failing after it gets sufficiently large?

Again, no link conclusively proved, but I did see log messages like the
following when the memory use had grown to 4Gb...

Nov  5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it

I looked at the source for ilbd and I think this could be caused by a
memory allocation failure in iu_schedule_timer.

After these messages was generated, it looks like the disabled servers
were never re-enabled, so eventually this could end up with no enabled
servers, and therefore no service, without manual intervention.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Al Slater

On 05/11/2015 14:57, Dan McDonald wrote:



On Nov 5, 2015, at 6:38 AM, Al Slater  wrote:

I have the 4Gb core file.  Is there anything useful I can extract from
it to try and spot where the problem is?


Your one ::findleaks showed nothing.  Did your 4GB corefile have ::findleaks 
show nothing as well?


::findleaks against the 4GB corefile showed nothing.

--
Al Slater



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Dan McDonald

> On Nov 6, 2015, at 3:11 AM, Al Slater  wrote:
> 
> On 05/11/2015 14:57, Dan McDonald wrote:
>> 
>>> On Nov 5, 2015, at 6:38 AM, Al Slater  wrote:
>>> 
>>> I have the 4Gb core file.  Is there anything useful I can extract from
>>> it to try and spot where the problem is?
>> 
>> Your one ::findleaks showed nothing.  Did your 4GB corefile have ::findleaks 
>> show nothing as well?
> 
> ::findleaks against the 4GB corefile showed nothing.

None of the libumem stats show anything resembling leaks or even excessive 
allocation.

pmap(1) of the corefile is semi-interesting:

r151014(~/corefiles/Slater)[0]% pmap core.3041 
core 'core.3041' of 3041:   /usr/lib/inet/ilbd
0802E000 104K rw---[ stack ]
0805  76K r-x--  /usr/lib/inet/ilbd
08073000   4K rw---  /usr/lib/inet/ilbd
080740001252K rw---[ heap ]
081B 256K rwx--[ anon ]
08202048K rwx--[ anon ]
0841 256K rwx--[ anon ]
0846 512K rwx--[ anon ]
084F1024K rwx--[ anon ]
08608192K rwx--[ anon ]
08E1 256K rwx--[ anon ]
08E6 512K rwx--[ anon ]
08EF1024K rwx--[ anon ]
0900   65536K rwx--[ anon ]
0D01 256K rwx--[ anon ]
0D06 512K rwx--[ anon ]
0D0F1024K rwx--[ anon ]
0D20  262144K rwx--[ anon ]
1D21 256K rwx--[ anon ]
1D26 512K rwx--[ anon ]
1D2F1024K rwx--[ anon ]
1D40  524288K rwx--[ anon ]
3D41 256K rwx--[ anon ]
3D46 512K rwx--[ anon ]
3D4F1024K rwx--[ anon ]
3D60 1048576K rwx--[ anon ]
7D61 256K rwx--[ anon ]
7D66 512K rwx--[ anon ]
7D6F1024K rwx--[ anon ]
7D80 1048576K rwx--[ anon ]
BD81 256K rwx--[ anon ]
BD86 512K rwx--[ anon ]
BD8F1024K rwx--[ anon ]
BDA0  524288K rwx--[ anon ]
DDA1 256K rwx--[ anon ]
DDA6 512K rwx--[ anon ]
DDAF1024K rwx--[ anon ]
DDC0  262144K rwx--[ anon ]
EDC1 256K rwx--[ anon ]
EDC6 512K rwx--[ anon ]
EDCF1024K rwx--[ anon ]
EDE0  131072K rwx--[ anon ]
F5E1 256K rwx--[ anon ]
F5E6 512K rwx--[ anon ]
F5EF1024K rwx--[ anon ]
F600   65536K rwx--[ anon ]
FA01 256K rwx--[ anon ]
FA06 512K rwx--[ anon ]
FA0F1024K rwx--[ anon ]
FA20   32768K rwx--[ anon ]
FC21 256K rwx--[ anon ]
FC26 512K rwx--[ anon ]
FC2F1024K rwx--[ anon ]
FC40   16384K rwx--[ anon ]
FD41 256K rwx--[ anon ]
FD46 512K rwx--[ anon ]
FD4F1024K rwx--[ anon ]
FD608192K rwx--[ anon ]
FDE1 256K rwx--[ anon ]
FDE6 512K rwx--[ anon ]
FDEF1024K rwx--[ anon ]
FE004096K rwx--[ anon ]
FE41 256K rwx--[ anon ]
FE46 512K rwx--[ anon ]
FE4F1024K rwx--[ anon ]
FE602048K rwx--[ anon ]
FE821024K rwx--[ anon ]
FE931024K rwx--[ anon ]
FEA4 512K rwx--[ anon ]
FEAD 256K rwx--[ anon ]
FEB2 128K rwx--[ anon ]
FEB5  64K rwx--[ anon ]
FEB7  64K rwx--[ anon ]
FEB9   4K rwx--[ anon ]
FEBA  20K r-x--  /usr/lib/libilb.so.1
FEBB5000   4K rw---  /usr/lib/libilb.so.1
FEBC  32K r-x--  /lib/libuutil.so.1
FEBD8000   4K rw---  /lib/libuutil.so.1
FEBE   4K rwx--[ anon ]
FEBF 172K r-x--  /lib/libscf.so.1
FEC2B000   4K rw---  /lib/libscf.so.1
FEC3  20K r-x--  /lib/libinetutil.so.1
FEC45000   4K rw---  /lib/libinetutil.so.1
FEC5   4K rwx--[ anon ]
FEC6  20K r-x--  /lib/libcmdutils.so.1
FEC75000   4K rw---  /lib/libcmdutils.so.1
FEC8   4K r*   [ anon ]
FEC9  64K rwx--[ anon ]
FECB  64K rwx--[ anon ]
FECD 416K r-x--  /lib/libnsl.so.1
FED48000   8K rw---  /lib/libnsl.so.1
FED4A000  20K rw---  /lib/libnsl.so.1
FED5   4K rwx--[ anon ]
FED6  52K r-x--  /lib/libsocket.so.1
FED7D000   4K rw---  /lib/libsocket.so.1
FED8  24K rwx--[ anon ]
FED91252K r-x--  /lib/libc.so.1
FEED9000  36K rwx--  /lib/libc.so.1
FEEE2000   8K rwx--  /lib/libc.so.1
FEEF   4K rwx--[ anon ]
FEF0 196K r-x--  /lib/libumem.so.1
FEF4   8K rwx--  /lib/libumem.so.1
FEF52000  76K rw---  /lib/libumem.so.1
FEF65000  24K rw---  /lib/libumem.so.1
FEF7   4K r*   [ anon ]
FEF8   4K rwx--[ anon ]
FEF9   4K rw---[ anon ]
FEFA   4K rw---[ anon ]
FEFB   4K rwx--[ anon ]
FEFB5000 216K r-x--  /lib/ld.so.1
FEFFB000   8K rwx--  /lib/ld.so.1
FEFFD000   4K rwx--  /lib/ld.so.1
 total   4040340K
r151014(~/corefiles/Slater)[0]% 

Lots 

Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Dan McDonald

> On Nov 6, 2015, at 9:39 AM, Dan McDonald  wrote:
> 
> Lots of LARGE anonymous mappings.  I wonder why that happened? I'll dig into 
> that a bit more.

pmap(1) works even better on running processes.  Could you run, say "pmap -xa 
`pgrep ilbd`" on your running machine?

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Al Slater
On 06/11/15 14:51, Dan McDonald wrote:
> 
>> On Nov 6, 2015, at 9:39 AM, Dan McDonald  wrote:
>>
>> Lots of LARGE anonymous mappings.  I wonder why that happened? I'll dig into 
>> that a bit more.
> 
> pmap(1) works even better on running processes.  Could you run, say "pmap -xa 
> `pgrep ilbd`" on your running machine?

Here you go...

root@loki:/export/home/BRIGHTON/aslate# pmap -xa `pgrep ilbd`
12346:  /usr/lib/inet/ilbd
 Address  Kbytes RSSAnon  Locked Mode   Mapped File
08027000 132 132 132   - rw---[ stack ]
0805  76  76   -   - r-x--  ilbd
08073000   4   4   4   - rw---  ilbd
08074000  96   -   -   - rw---  ilbd
0808C000115611401112   - rw---[ heap ]
0D20  262144  262144  262144   - rwx--[ anon ]
1D40  524288  524288  524288   - rwx--[ anon ]
3D60 1048576 1048576 1048576   - rwx--[ anon ]
7D80 1048576 1048576 1048576   - rwx--[ anon ]
BDA0  524288  524288  524288   - rwx--[ anon ]
DDC0  262144  262144  262144   - rwx--[ anon ]
EDE0  131072  131072  131072   - rwx--[ anon ]
F600   65536   65536   65536   - rwx--[ anon ]
FA20   32768   32768   32768   - rwx--[ anon ]
FC40   16384   16384   16384   - rwx--[ anon ]
FD60819281928192   - rwx--[ anon ]
FE00409640964096   - rwx--[ anon ]
FE60204820482048   - rwx--[ anon ]
FE8A  36  16   -   - r-x--  libtsol.so.2
FE8B9000   4   4   4   - rw---  libtsol.so.2
FE8C   4   4   4   - rwx--[ anon ]
FE8D 140 112   -   - r-x--  libbsm.so.1
FE903000  28  28  28   - rw---  libbsm.so.1
FE90A000   4   -   -   - rw---  libbsm.so.1
FE91  16  16   -   - r-x--  libsecdb.so.1
FE924000   4   4   4   - rw---  libsecdb.so.1
FE93102410241024   - rwx--[ anon ]
FEA4 512 512 512   - rwx--[ anon ]
FEAD 256 256 256   - rwx--[ anon ]
FEB2 128 128 128   - rwx--[ anon ]
FEB5  64  64  64   - rwx--[ anon ]
FEB7  64  16  16   - rwx--[ anon ]
FEB9   4   4   4   - rwx--[ anon ]
FEBA  20  20   -   - r-x--  libilb.so.1
FEBB5000   4   4   4   - rw---  libilb.so.1
FEBC  32  32   -   - r-x--  libuutil.so.1
FEBD8000   4   4   4   - rw---  libuutil.so.1
FEBE   4   4   4   - rwx--[ anon ]
FEBF 172 148   -   - r-x--  libscf.so.1
FEC2B000   4   4   4   - rw---  libscf.so.1
FEC3  20  20   -   - r-x--  libinetutil.so.1
FEC45000   4   4   4   - rw---  libinetutil.so.1
FEC5   4   4   4   - rwx--[ anon ]
FEC6  20  12   -   - r-x--  libcmdutils.so.1
FEC75000   4   4   4   - rw---  libcmdutils.so.1
FEC8   4   4   -   - r--s-  dev:528,24 ino:2821218250
FEC9  64  64   4   - rwx--[ anon ]
FECB  64  64   4   - rwx--[ anon ]
FECD 416 368   -   - r-x--  libnsl.so.1
FED48000   8   8   8   - rw---  libnsl.so.1
FED4A000  20  16   4   - rw---  libnsl.so.1
FED5   4   4   4   - rwx--[ anon ]
FED6  52  48   -   - r-x--  libsocket.so.1
FED7D000   4   4   4   - rw---  libsocket.so.1
FED8  24  12  12   - rwx--[ anon ]
FED91252 936   -   - r-x--  libc_hwcap1.so.1
FEED9000  36  36  32   - rwx--  libc_hwcap1.so.1
FEEE2000   8   8   8   - rwx--  libc_hwcap1.so.1
FEEF   4   4   4   - rwx--[ anon ]
FEF0 196 112   -   - r-x--  libumem.so.1
FEF4   8   4   4   - rwx--  libumem.so.1
FEF52000  76  72  16   - rw---  libumem.so.1
FEF65000  24  24  24   - rw---  libumem.so.1
FEF7   4   4   -   - r--s-  ld.config
FEF8   4   4   4   - rwx--[ anon ]
FEF9   4   4   4   - rw---[ anon ]
FEFA   4   4   4   - rw---[ anon ]
FEFB   4   4   4   - rwx--[ anon ]
FEFB5000 216 216   -   - r-x--  ld.so.1
FEFFB000   8   8   8   - rwx--  ld.so.1
FEFFD000   4   4   4   - rwx--  ld.so.1
 --- --- --- ---
total Kb 3936668 3935948 3933588

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com

Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Dan McDonald

> On Nov 6, 2015, at 10:57 AM, Al Slater  wrote:
> 
> 
> 7D80 1048576 1048576 1048576   - rwx--[ anon ]
> BDA0  524288  524288  524288   - rwx--[ anon ]
> DDC0  262144  262144  262144   - rwx--[ anon ]
> EDE0  131072  131072  131072   - rwx--[ anon ]

More huge anonymous mappings (1G, 512MB, 256MB, 128MB).

I don't know pmap as well as I should.  I don't see anything in the man page to 
give me further insight into why these chunks of memory are being eaten.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Dan McDonald

> On Nov 6, 2015, at 11:25 AM, Dan McDonald  wrote:
> 
>> On Nov 6, 2015, at 10:57 AM, Al Slater  wrote:
>> 
>> 
>> 7D80 1048576 1048576 1048576   - rwx--[ anon ]
>> BDA0  524288  524288  524288   - rwx--[ anon ]
>> DDC0  262144  262144  262144   - rwx--[ anon ]
>> EDE0  131072  131072  131072   - rwx--[ anon ]
> 
> More huge anonymous mappings (1G, 512MB, 256MB, 128MB).
> 

You said you had a test box, right?

Can you:

- Disable UMEM_DEBUG
- RESTART the service.
- IMMEDIATELY after restart do pmap, and do pmap once per (sec, 10 sec, 
something) to see how it grows?

After that, maybe we can dtrace and see what's going on.

Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Bob Friesenhahn

On Fri, 6 Nov 2015, Dan McDonald wrote:


More huge anonymous mappings (1G, 512MB, 256MB, 128MB).

I don't know pmap as well as I should.  I don't see anything in the 
man page to give me further insight into why these chunks of memory 
are being eaten.


It is pretty common for memory allocators to use anonymous mappings 
for large memory allocations.  This allows releasing memory back to 
the system.


Some applications use algorithms where they double the memory size 
request from the previous request when a little more memory is 
required in order to lessen the hit from many realloc() calls.  This 
might explain the power-of two sizes.  If this is being done, the 
smaller power of two allocations may be a bug.


Tracing mmap() calls on the program while is is running might reveal 
something.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-05 Thread Al Slater

To the mailing list as well...

On 22/10/2015 09:43, Al Slater wrote:
> On 21/10/2015 17:35, Dan McDonald wrote:
>>
>>> On Oct 21, 2015, at 6:08 AM, Al Slater 
>>> wrote:
>>>
>>> Hi,
>>>
>>> I am running omnios r151014 on a couple of machines with a couple
>>> of zones each.  1 zone runs apache as an SSL reverse proxy, the
>>> other runs ILB for load balancing web to app tier connections.
>>>
>>> I noticed that in the ILB zone, the ilbd process memory grows to
>>> about 2Gb.   Restarting ILB releases the memory, and then the
>>> memory usage gradually increases again, with each memory increase
>>> approximately 2 * the size of the previous one.  I run a cronjob
>>> twice a day ( 8am and 8pm) which restarts the ilb service and
>>> releases the memory.
>>>
>>> A graph of memory usage is available at
>>> https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
>>>
>   >> There are currently 62 rules in the load balancer, with a
>   >> total
>>> of 664 server/port pairs.
>>>
>>> Is there anything I can provide that would help track this down?
>>
>> You can use svccfg(1M) to enable user-level memory debugging on ilb.
>>   It may cause the ilb daemon to dump core.  (And you're just noticing
>>   this in the process, not kernel memory consumption, correct?)
>
> I am seeing kernel memory consumption increasing as well, but that may
> be a different issue.  The ilbd process memory is definitely growing.
>
>> As root:
>>
>> svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
>> svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
>>   enable ilb
>>
>> That should enable user-level memory debugging.  If you get a
>> coredump, save it and share it.  If you don't and the ilb daemon
>> keeps running, eventually please:
>>
>> gcore `pgrep ilbd`
>>
>> and share THAT corefile.  You can also do this by youself:
>>
>> mdb  > ::findleaks
>>
>> and share ::findleaks.
>>
>> Once you're done generating corefiles, repeat the steps above, but
>> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
>> setenv lines.
>
> Thanks Dan.  As we are talking about production boxes here, I will have
> to try and reproduce on another box and then I will give the process
> above a go and see what we come up with.

I have reproduced the problem on a test box.

prstat shows:

3041 daemon   3946M 3946M sleep   590   0:48:03 0.1% ilbd/1


memstat:

root@loki:/export/home/BRIGHTON/aslate# echo ::memstat | mdb -k
Page SummaryPagesMB  %Tot
     
Kernel 238420   931   12%
ZFS File Data  630861  2464   31%
Anon  1054835  4120   51%
Exec and libs2204 80%
Page cache  10624411%
Free (cachelist) 9236360%
Free (freelist)105626   4125%

Total 2051806  8014
Physical  2051805  8014

mdb findleaks:

root@loki:/export/home/BRIGHTON/aslate# mdb core.3041
Loading modules: [ libumem.so.1 libc.so.1 libcmdutils.so.1 libuutil.so.1
ld.so.1 ]
 > ::findleaks
findleaks: no memory leaks detected
 >

Now, I am seeing lots of log messages like the following in
/var/adm/messages

Nov  5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it


So, I was wrong about growing to 2Gb, the truth is nearer 4Gb.  I am
guessing that ilbd_hc_restart_timer is failing because no more memory
can be allocated.

I have the 4Gb core file.  Is there anything useful I can extract from
it to try and spot where the problem is?


-- Al Slater



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-05 Thread Al Slater

Hi Dan,

On 05/11/2015 14:57, Dan McDonald wrote:



On Nov 5, 2015, at 6:38 AM, Al Slater  wrote:

I have the 4Gb core file.  Is there anything useful I can extract
from it to try and spot where the problem is?


Your one ::findleaks showed nothing.  Did your 4GB corefile have
::findleaks show nothing as well?

::umausers may be helpful.



root@loki:/export/home/BRIGHTON/aslate# mdb core.3041
Loading modules: [ libumem.so.1 libc.so.1 libcmdutils.so.1 libuutil.so.1
ld.so.1 ]

::umausers

71424 bytes for 62 allocations with data size 1152:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_ilbd_alloc_sg+0x13
 ilbd_create_sg+0x9a
 ilbd_scf_instance_walk_pg+0x2a6
 ilbd_walk_sg_pgs+0x37
 i_ilbd_read_config+0x28
 main_loop+0x7f
 main+0x1d3
 _start+0x83
53120 bytes for 664 allocations with data size 80:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 ilbd_hc_srv_add+0x18
 ilbd_hc_associate_rule+0xd8
 ilbd_create_rule+0x1a3
 ilbd_scf_instance_walk_pg+0x1c4
 ilbd_walk_rule_pgs+0x37
 i_ilbd_read_config+0x4e
 main_loop+0x7f
 main+0x1d3
 _start+0x83
53120 bytes for 664 allocations with data size 80:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_add_srv2sg+0x15
 ilbd_add_server_to_group+0x310
 ilbd_scf_instance_walk_pg+0x2dd
 ilbd_walk_sg_pgs+0x37
 i_ilbd_read_config+0x28
 main_loop+0x7f
 main+0x1d3
 _start+0x83
31584 bytes for 658 allocations with data size 48:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x99
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 libinetutil.so.1`iu_schedule_timer_ms+0x2d
 libinetutil.so.1`iu_schedule_timer+0x37
 ilbd_hc_restart_timer+0xbc
 ilbd_hc_probe_timer+0x23
 libinetutil.so.1`iu_expire_timers+0xbe
 ilbd_hc_timeout+0x11
 main_loop+0xe6
 main+0x1d3
 _start+0x83
12288 bytes for 1 allocations with data size 12288:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libc.so.1`ltzset_u+0xa2
 libc.so.1`localtime_r+0x35
 libc.so.1`ctime_r+0x2c
 libc.so.1`vsyslog+0x1e4
 ilbd_log+0x48
 main+0x15e
 _start+0x83
10368 bytes for 54 allocations with data size 192:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x99
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_alloc_ilbd_rule+0x17
 ilbd_create_rule+0xfa
 ilbd_scf_instance_walk_pg+0x1c4
 ilbd_walk_rule_pgs+0x37
 i_ilbd_read_config+0x4e
 main_loop+0x7f
 main+0x1d3
 _start+0x83



Sharing the corefile would also be helpful.


I have put it on dropbox

https://www.dropbox.com/s/y6cv78d1xk5j5u7/core.3041.gz?dl=0


I'm assuming, given you see problems at 4GB that ilbd is a 32-bit
process, right?


Yes,

#  file /usr/lib/inet/ilbd
/usr/lib/inet/ilbd: ELF 32-bit LSB executable 80386 Version 1,
dynamically linked, not stripped, no debugging information available

cheers

--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-05 Thread Dan McDonald

> On Nov 5, 2015, at 6:38 AM, Al Slater  wrote:
> 
> I have the 4Gb core file.  Is there anything useful I can extract from
> it to try and spot where the problem is?

Your one ::findleaks showed nothing.  Did your 4GB corefile have ::findleaks 
show nothing as well?

::umausers may be helpful.

Sharing the corefile would also be helpful.  I'm assuming, given you see 
problems at 4GB that ilbd is a 32-bit process, right?

Thanks,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-23 Thread Richard PALO
Le 22/10/15 10:43, Al Slater a écrit :
> I am seeing kernel memory consumption increasing as well, but that may 
> be a different issue.  The ilbd process memory is definitely growing.
> 

this is indeed probably a different issue, but it would be useful to create a 
thread
on illumos discuss as I'm seeing it as well (not using ILB).. for example, 
running 
a number of rather intensive builds I see kernel steadily going up to ~40%!!:
> richard@omnis:/home/richard$ swap -hs ; echo ::memstat |pfexec mdb -k
> total: 1,8G allocated + 311M reserved = 2,1G used, 40G available
> Page SummaryPagesMB  %Tot
>      
> Kernel3231113 12621   39%
> ZFS File Data 2944763 11502   35%
> Anon   452803  17685%
> Exec and libs5088190%
> Page cache  65892   2571%
> Free (cachelist)70820   2761%
> Free (freelist)   1614595  6307   19%
> 
> Total 8385074 32754
> Physical  8385072 32754



-- 
Richard PALO

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-22 Thread Al Slater

On 21/10/2015 17:35, Dan McDonald wrote:



On Oct 21, 2015, at 6:08 AM, Al Slater 
wrote:

Hi,

I am running omnios r151014 on a couple of machines with a couple
of zones each.  1 zone runs apache as an SSL reverse proxy, the
other runs ILB for load balancing web to app tier connections.

I noticed that in the ILB zone, the ilbd process memory grows to
about 2Gb.   Restarting ILB releases the memory, and then the
memory usage gradually increases again, with each memory increase
approximately 2 * the size of the previous one.  I run a cronjob
twice a day ( 8am and 8pm) which restarts the ilb service and
releases the memory.

A graph of memory usage is available at
https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0


>> There are currently 62 rules in the load balancer, with a
>> total

of 664 server/port pairs.

Is there anything I can provide that would help track this down?


You can use svccfg(1M) to enable user-level memory debugging on ilb.
 It may cause the ilb daemon to dump core.  (And you're just noticing
 this in the process, not kernel memory consumption, correct?)


I am seeing kernel memory consumption increasing as well, but that may 
be a different issue.  The ilbd process memory is definitely growing.



As root:

svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
 enable ilb

That should enable user-level memory debugging.  If you get a
coredump, save it and share it.  If you don't and the ilb daemon
keeps running, eventually please:

gcore `pgrep ilbd`

and share THAT corefile.  You can also do this by youself:

mdb  > ::findleaks

and share ::findleaks.

Once you're done generating corefiles, repeat the steps above, but
use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
setenv lines.


Thanks Dan.  As we are talking about production boxes here, I will have 
to try and reproduce on another box and then I will give the process 
above a go and see what we come up with.


--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-22 Thread Ryan Zezeski

Al Slater writes:

> On 21/10/2015 17:35, Dan McDonald wrote:
>>
>> That should enable user-level memory debugging.  If you get a
>> coredump, save it and share it.  If you don't and the ilb daemon
>> keeps running, eventually please:
>>
>> gcore `pgrep ilbd`
>>
>> and share THAT corefile.  You can also do this by youself:
>>
>> mdb  > ::findleaks
>>
>> and share ::findleaks.
>>
>> Once you're done generating corefiles, repeat the steps above, but
>> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
>> setenv lines.
>
> Thanks Dan.  As we are talking about production boxes here, I will have 
> to try and reproduce on another box and then I will give the process 
> above a go and see what we come up with.

You can also use the DTrace pid provider to grab the user stack on every
malloc(3C) call, and the syscall provider to track mmap(2) calls. That
poses no harm to production and might make the cause of memory usage
obvious.

Something like:

dtrace -qn 'pid$target::malloc:entry { @[ustack()] = count(); }
syscall::mmap*:entry /pid == $target/ { @[ustack()] = count(); }' -p 

Let that run for a while as the memory grows, then Ctrl-C.

-Z
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-21 Thread Dan McDonald

> On Oct 21, 2015, at 6:08 AM, Al Slater  wrote:
> 
> Hi,
> 
> I am running omnios r151014 on a couple of machines with a couple of zones 
> each.  1 zone runs apache as an SSL reverse proxy, the other runs ILB for 
> load balancing web to app tier connections.
> 
> I noticed that in the ILB zone, the ilbd process memory grows to about 2Gb.   
> Restarting ILB releases the memory, and then the memory usage gradually 
> increases again, with each memory increase approximately 2 * the size of the 
> previous one.  I run a cronjob twice a day ( 8am and 8pm) which restarts the 
> ilb service and releases the memory.
> 
> A graph of memory usage is available at 
> https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
> 
> There are currently 62 rules in the load balancer, with a total of 664 
> server/port pairs.
> 
> Is there anything I can provide that would help track this down?

You can use svccfg(1M) to enable user-level memory debugging on ilb.  It may 
cause the ilb daemon to dump core.  (And you're just noticing this in the 
process, not kernel memory consumption, correct?)

As root:

svcadm disable -t ilb
svccfg -s ilb setenv LD_PRELOAD libumem.so
svccfg -s ilb setenv UMEM_DEBUG default
svccfg -s ilb refresh
svcadm enable ilb

That should enable user-level memory debugging.  If you get a coredump, save it 
and share it.  If you don't and the ilb daemon keeps running, eventually please:

gcore `pgrep ilbd`

and share THAT corefile.  You can also do this by youself:

mdb 
> ::findleaks

and share ::findleaks.

Once you're done generating corefiles, repeat the steps above, but use 
"unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the setenv lines.

Hope this helps,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-21 Thread Bob Friesenhahn

On Wed, 21 Oct 2015, Dan McDonald wrote:


You can use svccfg(1M) to enable user-level memory debugging on ilb.  It may 
cause the ilb daemon to dump core.  (And you're just noticing this in the 
process, not kernel memory consumption, correct?)

As root:

svcadm disable -t ilb
svccfg -s ilb setenv LD_PRELOAD libumem.so
svccfg -s ilb setenv UMEM_DEBUG default
svccfg -s ilb refresh
svcadm enable ilb


Is there a way to use ulimit to limit the data segment size (ulimit 
-d)?  If this is possible, then a dumped core (due to hitting the 
limit) may point directly to the guilty party.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] ILB memory leak?

2015-10-21 Thread Al Slater

Hi,

I am running omnios r151014 on a couple of machines with a couple of 
zones each.  1 zone runs apache as an SSL reverse proxy, the other runs 
ILB for load balancing web to app tier connections.


I noticed that in the ILB zone, the ilbd process memory grows to about 
2Gb.   Restarting ILB releases the memory, and then the memory usage 
gradually increases again, with each memory increase approximately 2 * 
the size of the previous one.  I run a cronjob twice a day ( 8am and 
8pm) which restarts the ilb service and releases the memory.


A graph of memory usage is available at 
https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0


There are currently 62 rules in the load balancer, with a total of 664 
server/port pairs.


Is there anything I can provide that would help track this down?


--
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss