Hi Mandy, Bob,
Thanks again for the reviews and patience on this. Sorry it took me so
long to get back to this :-/
Updated webrev:
Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
incremental:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/
I've tested this with docker tests on cgroup v1 and via podman on a
cgroup v2 system. They pass. I'll be running this through jdk-submit as
well.
More below.
On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
Hi Severin,
Thanks for the update.
On 1/21/20 11:30 AM, Severin Gehwolf wrote:
Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
incremental:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/
I have answered my own question. Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.
The following are about limits which used to return -1 if unlimited or no limit
set.
public long getCpuQuota();
public long getCpuShares();
public long getMemoryLimit();
public long getMemoryAndSwapLimit();
public long getMemorySoftLimit();
With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to
return -1 if unlimited or no limit set. However the implementation does return
-1. All of the above specify to return -2 if unavailable due to error fetching
the metric.
I found the implementation quite hard to follow. I spent some time reviewing
the code to see if the implementation matches the spec but I can't easily tell
yet. For example,
CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException
occurs.
CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.
CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
when value overflows
This one is intentional. It's mapped back to unlimited via
longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
have a concept of "unlimited". Unlimited values will be a very large
numbers in cgroup v1 files.
CgroupV2SubsystemController::convertStringToLong returns -1 when IOException
occurs
CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0
These two are special cases too. See the implementation note of
Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
(over 1024 in cgroup v1). That's why unlimited is being returned for
those values.
CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or
malformed content
CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
This is called by getBlkIOServiceCount and getBlkIOServiced
I think this can be improved and add the documentation to describe
what the methods do. Since Metrics APIs consistently return -2 if
unavailable due to error in fetching the metric, why some utility
methods in *Subsystem and *SubsystemController return -1 upon error
and 0 when unlimited?
I suspect if the getXXXValue and other methods are clearly documented
with the error cases (possibly renaming the method name if appropriate)
CgroupV1Subsystem and CgroupV2SubSystem will become very explicit
to understand.
This should be fixed now.
I've gone through the API doc of Metrics.java and have updated it. In
general, I've updated it to return -1 if metric is unavailable (due to
error in reading some files or content being empty), and -2 if not
supported. No method returns -2 currently, but it might change and it's
good to have some way of saying "not implementable" for this subsystem
in the spec. That's my take on it anyway.
There is also a new unit test for shared controller logic:
TestCgroupSubsystemController.java
It execises various cases of error/success.
That is to ensure proper symmetry across the various cases (including
IOException). I've also documented static methods in
CgroupSubsystemController. Overall, all methods now return the same
values for cgroup v1 and cgroup v2 (given the impl nuances) for the
various cases.
CgroupSubsystem.java
44 public static final double DOUBLE_RETVAL_NOT_SUPPORTED =
LONG_RETVAL_NOT_SUPPORTED;
49 public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;
They are no longer needed, right?
Removed.
CgroupSubsystemFactory.java
89 System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not
supported. Metrics disabled.");
I expect this be a System.Logger log
Updated.
114 if (!Integer.valueOf(0).toString().equals(tokens[0]))
{
This can be simplified to if (!"0".equals(tokens[0]))
Done, thanks!
LauncherHelper.java
407 // Extended cgroupv1 specific metrics
408 if (c instanceof MetricsCgroupV1) {
409 MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
410 limit = cgroupV1.getKernelMemoryLimit();
411 ostream.println(formatLimitString(limit, INDENT + "Kernel Memory
Limit: "));
412 limit = cgroupV1.getTcpMemoryLimit();
413 ostream.println(formatLimitString(limit, INDENT + "TCP Memory
Limit: "));
414 Boolean value = cgroupV1.isMemoryOOMKillEnabled();
415 ostream.println(formatBoolean(value, INDENT + "Out Of Memory
Killer Enabled: "));
416 value = cgroupV1.isCpuSetMemoryPressureEnabled();
417 ostream.println(formatBoolean(value, INDENT + "CPUSet Memory
Pressure Enabled: "));
418 }
MetricsCgroupV1 is linux-only. It will fail the compilation when
building on non-linux. One option is to move this code to
src/java.base/linux/share/sun/launcher/CgroupMetrics.java
Are they continued to be interesting metrics to be output from
-XshowSetting? I wonder if they can simply be dropped from the output.
Bob will have an opinion.
I've removed those extra cgroup v1 specific metrics printed via
-XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
only used in tests in webrev 10. On the other hand the idea would be
for consumers to downcast it to MetricsCgroupV1 if they needed those
extra metrics.
Thanks,
Severin