On Tue, May 8, 2018 at 9:16 AM Nir Soffer <[email protected]> wrote:
> On Mon, May 7, 2018 at 6:48 PM Chris Adams <[email protected]> wrote: > >> I have a problem with a memory leak in vdsm. I have a dev cluster that >> right now is: >> >> - two nodes >> - CentOS 7.4 (up to date) >> - oVirt 4.2.2 (installed as 3.5 and upgraded version by version) >> - hosted engine (no other running VM at the moment) >> - iSCSI storage >> >> I have a script that writes the vdsm RSS to a file every five minutes, >> and on the node holding the hosted engine, vdsm RSS grows around >> 300-1500KB every snapshot. >> >> I maintain several oVirt clusters for others, and they all seem to have >> this problem. The production clusters are all still on oVirt 4.1, but >> they all have this problem too, so I guess it is something about how I >> set them up? On a couple I just checked, the vdsm RSS is over 1G. >> >> Any tips on instrumenting vdsm to track this down? I am unfortunately >> only passingly familiar with python (I can make small changes, but not >> knowledgeable enough to figure this out). >> > > To debug these issues, you should enable the health monitor by creating > a drop-in configuration file: > > $ cat /etc/vdsm/vdsm.conf.d/health.conf > [devel] > health_monitor_enable = true > > And restart vdsm to start the health monitor. > > The health logs are using DEBUG level so you need to enable > DEBUG level for the "health" logger. You can do this with: > > $ vdsm-client Host setLogLevel level=DEBUG name=health > > Or by adding new logger configuration to /etc/vdsm/logger.conf: > > 1. add health logger to [loggers] > > [loggers] > > keys=root,vds,storage,virt,ovirt_hosted_engine_ha,ovirt_hosted_engine_ha_config,IOProcess,devel,health > > 2. add [logger_health] section > > [logger_health] > level=DEBUG > handlers=logthread > qualname=health > propagate=0 > > Finally post here the [health] logs from vdsm.log. > > Here are example logs on a host running 4.2.3 without hosted engine: # grep '(health)' /var/log/vdsm/vdsm.log 2018-05-08 10:54:29,424+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:54:29,451+0300 DEBUG (health) [health] Collected 4710 objects (health:97) 2018-05-08 10:54:29,451+0300 DEBUG (health) [health] user=2.22%, sys=0.92%, rss=84192 kB (+23436), threads=70 (health:122) 2018-05-08 10:55:29,451+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:55:29,472+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 10:55:29,474+0300 DEBUG (health) [health] user=0.60%, sys=0.38%, rss=84236 kB (+44), threads=62 (health:122) 2018-05-08 10:56:29,475+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:56:29,496+0300 DEBUG (health) [health] Collected 48 objects (health:97) 2018-05-08 10:56:29,496+0300 DEBUG (health) [health] user=0.58%, sys=0.42%, rss=84444 kB (+208), threads=62 (health:122) 2018-05-08 10:57:29,497+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:57:29,521+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 10:57:29,521+0300 DEBUG (health) [health] user=0.58%, sys=0.42%, rss=84452 kB (+8), threads=62 (health:122) 2018-05-08 10:58:29,522+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:58:29,545+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 10:58:29,545+0300 DEBUG (health) [health] user=0.60%, sys=0.42%, rss=84500 kB (+48), threads=62 (health:122) 2018-05-08 10:59:29,546+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 10:59:29,569+0300 DEBUG (health) [health] Collected 137 objects (health:97) 2018-05-08 10:59:29,570+0300 DEBUG (health) [health] user=0.70%, sys=0.48%, rss=84876 kB (+376), threads=69 (health:122) 2018-05-08 11:00:29,570+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 11:00:29,593+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 11:00:29,594+0300 DEBUG (health) [health] user=0.60%, sys=0.43%, rss=84744 kB (-132), threads=62 (health:122) 2018-05-08 11:01:29,594+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 11:01:29,617+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 11:01:29,617+0300 DEBUG (health) [health] user=0.65%, sys=0.40%, rss=84748 kB (+4), threads=62 (health:122) 2018-05-08 11:02:29,618+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 11:02:29,641+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 11:02:29,641+0300 DEBUG (health) [health] user=0.57%, sys=0.40%, rss=84748 kB (+0), threads=62 (health:122) 2018-05-08 11:03:29,642+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 11:03:29,665+0300 DEBUG (health) [health] Collected 43 objects (health:97) 2018-05-08 11:03:29,665+0300 DEBUG (health) [health] user=0.60%, sys=0.43%, rss=84812 kB (+64), threads=62 (health:122) 2018-05-08 11:04:29,666+0300 DEBUG (health) [health] Checking health (health:90) 2018-05-08 11:04:29,690+0300 DEBUG (health) [health] Collected 137 objects (health:97) 2018-05-08 11:04:29,690+0300 DEBUG (health) [health] user=0.73%, sys=0.47%, rss=85008 kB (+196), threads=69 (health:122) Health stats are also reported using the configured metrics collector, (see [metrics] section in vdsm.conf) using these names: hosts.vdsm.gc.uncollectable hosts.vdsm.cpu.user_pct hosts.vdsm.cpu.sys_pct hosts.vdsm.cpu.memory.rss hosts.vdsm.threads_count Nir
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
