SuXingLee commented on issue #7966: [FLINK-11887][metrics] Fixed latency metrics drift apart URL: https://github.com/apache/flink/pull/7966#issuecomment-472906062 Thank for your comment. We don't use ```System.nanoTime``` for compute latency metrics directly. Because, when a shuffle happened bewteen source(A node) and operator(B node), the latency value is ```endTime - startTime```. ```startTime``` is produced by source(A taskManager), but ```endTime``` is produced by operator(B taskManager), and as we know,```System.nanoTime()``` is guaranteed to be safe within a single JVM instance. So, it would not be a right way that change ```LatencyStats``` to use ```System.nanoTime()``` instead. Come back to this issue [FLINK-11887](https://issues.apache.org/jira/browse/FLINK-11887).The original way that we get ```startTime``` is use ```SystemProcessingTimeService#scheduleAtFixedRate``` to accumulate a fixed time interval periodicity. With time going on, there is no guarantee that startTime and actual time don't drift apart.Especially if they are executed on different machines.In my cluster environment,I found the startTime is much later than actual time. If we change ```LatencyStats``` to use ```SystemProcessingTimeService#scheduleAtFixedRate``` to acquire ```endTime```,it will be unable to avoid time drift apart in different nodes. In many data center,different linux machines use Network Time Protocol to synchronize time. So we use ```System.currentTimeMillis (endTime) - System.currentTimeMillis (startTime)``` is a relatively accurate way.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
