On Tue, May 8, 2018 at 9:16 AM Nir Soffer <[email protected]> wrote:

> On Mon, May 7, 2018 at 6:48 PM Chris Adams <[email protected]> wrote:
>
>> I have a problem with a memory leak in vdsm.  I have a dev cluster that
>> right now is:
>>
>> - two nodes
>> - CentOS 7.4 (up to date)
>> - oVirt 4.2.2 (installed as 3.5 and upgraded version by version)
>> - hosted engine (no other running VM at the moment)
>> - iSCSI storage
>>
>> I have a script that writes the vdsm RSS to a file every five minutes,
>> and on the node holding the hosted engine, vdsm RSS grows around
>> 300-1500KB every snapshot.
>>
>> I maintain several oVirt clusters for others, and they all seem to have
>> this problem.  The production clusters are all still on oVirt 4.1, but
>> they all have this problem too, so I guess it is something about how I
>> set them up?  On a couple I just checked, the vdsm RSS is over 1G.
>>
>> Any tips on instrumenting vdsm to track this down?  I am unfortunately
>> only passingly familiar with python (I can make small changes, but not
>> knowledgeable enough to figure this out).
>>
>
> To debug these issues, you should enable the health monitor by creating
> a drop-in configuration file:
>
> $ cat /etc/vdsm/vdsm.conf.d/health.conf
> [devel]
> health_monitor_enable = true
>
> And restart vdsm to start the health monitor.
>
> The health logs are using DEBUG level so you need to enable
> DEBUG level for the "health" logger. You can do this with:
>
>     $ vdsm-client Host setLogLevel level=DEBUG name=health
>
> Or by adding new logger configuration to /etc/vdsm/logger.conf:
>
> 1. add health logger to [loggers]
>
> [loggers]
>
> keys=root,vds,storage,virt,ovirt_hosted_engine_ha,ovirt_hosted_engine_ha_config,IOProcess,devel,health
>
> 2. add [logger_health] section
>
> [logger_health]
> level=DEBUG
> handlers=logthread
> qualname=health
> propagate=0
>
> Finally post here the [health] logs from vdsm.log.
>
>

Here are example logs on a host running 4.2.3 without hosted engine:

# grep '(health)' /var/log/vdsm/vdsm.log
2018-05-08 10:54:29,424+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:54:29,451+0300 DEBUG (health) [health] Collected 4710 objects
(health:97)
2018-05-08 10:54:29,451+0300 DEBUG (health) [health] user=2.22%, sys=0.92%,
rss=84192 kB (+23436), threads=70 (health:122)
2018-05-08 10:55:29,451+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:55:29,472+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 10:55:29,474+0300 DEBUG (health) [health] user=0.60%, sys=0.38%,
rss=84236 kB (+44), threads=62 (health:122)
2018-05-08 10:56:29,475+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:56:29,496+0300 DEBUG (health) [health] Collected 48 objects
(health:97)
2018-05-08 10:56:29,496+0300 DEBUG (health) [health] user=0.58%, sys=0.42%,
rss=84444 kB (+208), threads=62 (health:122)
2018-05-08 10:57:29,497+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:57:29,521+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 10:57:29,521+0300 DEBUG (health) [health] user=0.58%, sys=0.42%,
rss=84452 kB (+8), threads=62 (health:122)
2018-05-08 10:58:29,522+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:58:29,545+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 10:58:29,545+0300 DEBUG (health) [health] user=0.60%, sys=0.42%,
rss=84500 kB (+48), threads=62 (health:122)
2018-05-08 10:59:29,546+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 10:59:29,569+0300 DEBUG (health) [health] Collected 137 objects
(health:97)
2018-05-08 10:59:29,570+0300 DEBUG (health) [health] user=0.70%, sys=0.48%,
rss=84876 kB (+376), threads=69 (health:122)
2018-05-08 11:00:29,570+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 11:00:29,593+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 11:00:29,594+0300 DEBUG (health) [health] user=0.60%, sys=0.43%,
rss=84744 kB (-132), threads=62 (health:122)
2018-05-08 11:01:29,594+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 11:01:29,617+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 11:01:29,617+0300 DEBUG (health) [health] user=0.65%, sys=0.40%,
rss=84748 kB (+4), threads=62 (health:122)
2018-05-08 11:02:29,618+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 11:02:29,641+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 11:02:29,641+0300 DEBUG (health) [health] user=0.57%, sys=0.40%,
rss=84748 kB (+0), threads=62 (health:122)
2018-05-08 11:03:29,642+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 11:03:29,665+0300 DEBUG (health) [health] Collected 43 objects
(health:97)
2018-05-08 11:03:29,665+0300 DEBUG (health) [health] user=0.60%, sys=0.43%,
rss=84812 kB (+64), threads=62 (health:122)
2018-05-08 11:04:29,666+0300 DEBUG (health) [health] Checking health
(health:90)
2018-05-08 11:04:29,690+0300 DEBUG (health) [health] Collected 137 objects
(health:97)
2018-05-08 11:04:29,690+0300 DEBUG (health) [health] user=0.73%, sys=0.47%,
rss=85008 kB (+196), threads=69 (health:122)

Health stats are also reported using the configured metrics collector,
(see [metrics] section in vdsm.conf) using these names:

hosts.vdsm.gc.uncollectable
hosts.vdsm.cpu.user_pct
hosts.vdsm.cpu.sys_pct
hosts.vdsm.cpu.memory.rss
hosts.vdsm.threads_count

Nir
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to