I'll take a look next couple days.  I was out last few days and am catching up on other things.

Mandy

On 2/11/20 10:04 AM, Severin Gehwolf wrote:
Hi Mandy, Bob,

Thanks again for the reviews and patience on this. Sorry it took me so
long to get back to this :-/

Updated webrev:
Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
incremental: 
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/

I've tested this with docker tests on cgroup v1 and via podman on a
cgroup v2 system. They pass. I'll be running this through jdk-submit as
well.

More below.

On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
Hi Severin,

Thanks for the update.

On 1/21/20 11:30 AM, Severin Gehwolf wrote:
Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
incremental: 
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/

I have answered my own question. Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.

The following are about limits which used to return -1 if unlimited or no limit 
set.
     public long getCpuQuota();
     public long getCpuShares();
     public long getMemoryLimit();
     public long getMemoryAndSwapLimit();
     public long getMemorySoftLimit();

With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to 
return -1 if unlimited or no limit set.  However the implementation does return 
-1.  All of the above specify to return -2 if unavailable due to error fetching 
the metric.

I found the implementation quite hard to follow.  I spent some time reviewing 
the code to see if the implementation matches the spec  but I can't easily tell 
yet.  For example,
CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException 
occurs.
CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.

CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
when value overflows
This one is intentional. It's mapped back to unlimited via
longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
have a concept of "unlimited". Unlimited values will be a very large
numbers in cgroup v1 files.

CgroupV2SubsystemController::convertStringToLong returns -1 when IOException 
occurs

CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0
These two are special cases too. See the implementation note of
Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
(over 1024 in cgroup v1). That's why unlimited is being returned for
those values.

CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or 
malformed content
CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
    This is called by getBlkIOServiceCount and getBlkIOServiced

I think this can be improved and add the documentation to describe
what the methods do.  Since Metrics APIs consistently return -2 if
unavailable due to error in fetching the metric, why some utility
methods in *Subsystem and *SubsystemController return -1 upon error
and 0 when unlimited?

I suspect if the getXXXValue and other methods are clearly documented
with the error cases (possibly renaming the method name if appropriate)
CgroupV1Subsystem and CgroupV2SubSystem will become very explicit
to understand.
This should be fixed now.

I've gone through the API doc of Metrics.java and have updated it. In
general, I've updated it to return -1 if metric is unavailable (due to
error in reading some files or content being empty), and -2 if not
supported. No method returns -2 currently, but it might change and it's
good to have some way of saying "not implementable" for this subsystem
in the spec. That's my take on it anyway.

There is also a new unit test for shared controller logic:
TestCgroupSubsystemController.java

It execises various cases of error/success.

That is to ensure proper symmetry across the various cases (including
IOException). I've also documented static methods in
CgroupSubsystemController. Overall, all methods now return the same
values for cgroup v1 and cgroup v2 (given the impl nuances) for the
various cases.

CgroupSubsystem.java

   44     public static final double DOUBLE_RETVAL_NOT_SUPPORTED = 
LONG_RETVAL_NOT_SUPPORTED;
   49     public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;

They are no longer needed, right?
Removed.

CgroupSubsystemFactory.java

   89             System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not 
supported. Metrics disabled.");


I expect this be a System.Logger log
Updated.

  114                     if (!Integer.valueOf(0).toString().equals(tokens[0])) 
{

  This can be simplified to if (!"0".equals(tokens[0]))
Done, thanks!

LauncherHelper.java

  407         // Extended cgroupv1 specific metrics
  408         if (c instanceof MetricsCgroupV1) {
  409             MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
  410             limit = cgroupV1.getKernelMemoryLimit();
  411             ostream.println(formatLimitString(limit, INDENT + "Kernel Memory 
Limit: "));
  412             limit = cgroupV1.getTcpMemoryLimit();
  413             ostream.println(formatLimitString(limit, INDENT + "TCP Memory 
Limit: "));
  414             Boolean value = cgroupV1.isMemoryOOMKillEnabled();
  415             ostream.println(formatBoolean(value, INDENT + "Out Of Memory 
Killer Enabled: "));
  416             value = cgroupV1.isCpuSetMemoryPressureEnabled();
  417             ostream.println(formatBoolean(value, INDENT + "CPUSet Memory 
Pressure Enabled: "));
  418         }

MetricsCgroupV1 is linux-only.  It will fail the compilation when
building on non-linux.  One option is to move this code to
    src/java.base/linux/share/sun/launcher/CgroupMetrics.java

Are they continued to be interesting metrics to be output from
-XshowSetting?  I wonder if they can simply be dropped from the output.
Bob will have an opinion.
I've removed those extra cgroup v1 specific metrics printed via
-XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
only used in tests in webrev 10. On the other hand the idea would be
for consumers to downcast it to MetricsCgroupV1 if they needed those
extra metrics.

Thanks,
Severin


Reply via email to