Rui Fan created FLINK-36071:
-------------------------------
Summary: Using System.nanoTime to measure the elapsed time instead
of System.currentTimeMillis
Key: FLINK-36071
URL: https://issues.apache.org/jira/browse/FLINK-36071
Project: Flink
Issue Type: Improvement
Components: Runtime / Metrics
Reporter: Rui Fan
Assignee: Rui Fan
A series of flink metrics are using the System.currentTimeMillis[1] to measure
the elapsed time. I propose to refactor them from System.currentTimeMillis to
System.nanoTime[2].
h1. Why do we need to refactor it?
Note: High precision *{color:#de350b}is not{color}* the reason for refactor.
Actually, System.currentTimeMillis() and System.nanoTime() have completely
different semantics.
System.currentTimeMillis() *{color:#de350b}!={color}* System.nanoTime() /
1_000_000
* System.currentTimeMillis() is current system time of the server.
** The time can be updated by NTP[3], or it can be adjusted manually
* System.nanoTime() usually indicates the length of time since the operating
system was booted.
** So System.nanoTime isn't system time, and it's not effected by system time.
** System.nanoTime (inside the process) is monotonically increasing and never
goes back.
** As the job doc[2] mentioned: this method can only be used to measure
elapsed time and is not related to any other notion of system or wall-clock
time.
Here is a blog[4] to explain their difference in detail.
h1. Current use cases:
Based on last part, we know the System.nanoTime is recommended for measuring
the duration.
Most of tracing system is using it, and flink also uses it to measure the
duration for some metrics, such as:
* all latency tracks of state backend
* SubtaskCheckpointCoordinatorImpl#takeSnapshotSync measures the checkpoint
Sync Duration
* etc
In addition, the Clock[5] of flink extracted the absoluteTimeMillis,
relativeTimeMillis and relativeTimeNanos before. But I guess most of developers
doesn't know these details.
h1. Proposed changes:
This jira proposes that Flink uses System.nanoTime uniformly for duration
calculation.
Currently, many components still use System.currentTimeMillis to calculate
duration, it includes:
* TimerGauge
* TaskIOMetricGroup
* A lof of methods of StreamTask
* etc
[1]
[https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#currentTimeMillis--]
[2] [https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--]
[3] [https://en.wikipedia.org/wiki/Network_Time_Protocol]
[4]
[https://www.javaadvent.com/2019/12/measuring-time-from-java-to-kernel-and-back.html]
[5]
[https://github.com/apache/flink/blob/729b8b81a77ba6c32711216b88a1bf57ccddfadc/flink-core/src/main/java/org/apache/flink/util/clock/Clock.java#L40]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)