Re: RFR: 8203359: Container level resources events

Jaroslav Bachorik Fri, 02 Apr 2021 04:16:59 -0700

On Thu, 1 Apr 2021 15:55:59 GMT, Jaroslav Bachorik <jbacho...@openjdk.org> 
wrote:


>>> Does each getter call result in parsing /proc, or do things aggregated over 
>>> several calls or hooks?
>> 
>> From the looks of it the event emitting code uses `Metrics.java` interface 
>> for retrieving the info. Each call to a method exposed by Metrics result in 
>> file IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I 
>> don't see any aggregation being done.
>> 
>> On the hotspot side, we implemented some caching for frequent calls 
>> (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side 
>> since there wasn't any need (so far). If calls are becoming frequent with 
>> this it should be reconsidered.
>> 
>> So +1 on getting some data on what the perf penalty of this is.
>
> Thanks to all for chiming in!
> 
> I have added the tests to 
> `test/hotspot/jtreg/containers/docker/TestJFREvents.java` where there already 
> were some templates for the container event data.
> 
> As for the performance - as expected, extracting the data from `/proc` is not 
> exactly cheap. On my test c5.4xlarge instance I am getting an average 
> wall-clock time to generate the usage/throttling events (one instance of 
> each) of ~15ms.
> I would argue that 15ms per 30s (the default emission period for those 
> events) might be acceptable to start with. 
> 
> Caching of cgroups parsed data would help if the emission period is shorter 
> than the cache TTL. This is exacerbated by the fact that (almost) each 
> container event type requires data from a different cgroups control file - 
> hence the data will not be shared between the event type instances even if 
> cached. Realistically, caching benefits would become visible only for 
> sub-second emission periods.
> 
> If the caching is still required I would suggest having a follow up ticket 
> just for that - it will require setting up some benchmarks to justify the 
> changes that would need to be done in the metrics implementation.

I tried to measure the startup regression and here are my observations:
* Startup is not affected unless the application is started with JFR
* The extra events and hooks take ~5ms on my work machine
* It is possible not to register those events and hooks in a non-container env 
- then the overhead is 20-50us which it takes to figure out whether running in 
container

In order to minimize the effect this change will have on the startup I would 
suggest using conditional registration unless I hear strong objections to that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3126

Re: RFR: 8203359: Container level resources events

Reply via email to