On Thu, 1 Apr 2021 15:55:59 GMT, Jaroslav Bachorik <jbacho...@openjdk.org> wrote:
>>> Does each getter call result in parsing /proc, or do things aggregated over >>> several calls or hooks? >> >> From the looks of it the event emitting code uses `Metrics.java` interface >> for retrieving the info. Each call to a method exposed by Metrics result in >> file IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I >> don't see any aggregation being done. >> >> On the hotspot side, we implemented some caching for frequent calls >> (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side >> since there wasn't any need (so far). If calls are becoming frequent with >> this it should be reconsidered. >> >> So +1 on getting some data on what the perf penalty of this is. > > Thanks to all for chiming in! > > I have added the tests to > `test/hotspot/jtreg/containers/docker/TestJFREvents.java` where there already > were some templates for the container event data. > > As for the performance - as expected, extracting the data from `/proc` is not > exactly cheap. On my test c5.4xlarge instance I am getting an average > wall-clock time to generate the usage/throttling events (one instance of > each) of ~15ms. > I would argue that 15ms per 30s (the default emission period for those > events) might be acceptable to start with. > > Caching of cgroups parsed data would help if the emission period is shorter > than the cache TTL. This is exacerbated by the fact that (almost) each > container event type requires data from a different cgroups control file - > hence the data will not be shared between the event type instances even if > cached. Realistically, caching benefits would become visible only for > sub-second emission periods. > > If the caching is still required I would suggest having a follow up ticket > just for that - it will require setting up some benchmarks to justify the > changes that would need to be done in the metrics implementation. I tried to measure the startup regression and here are my observations: * Startup is not affected unless the application is started with JFR * The extra events and hooks take ~5ms on my work machine * It is possible not to register those events and hooks in a non-container env - then the overhead is 20-50us which it takes to figure out whether running in container In order to minimize the effect this change will have on the startup I would suggest using conditional registration unless I hear strong objections to that. ------------- PR: https://git.openjdk.java.net/jdk/pull/3126