SuXingLee commented on issue #7966: [FLINK-11887][metrics] Fixed latency 
metrics drift apart
URL: https://github.com/apache/flink/pull/7966#issuecomment-472906062
 
 
   Thank for your comment.
   We don't use ```System.nanoTime``` for compute latency metrics directly.
   Because, when a shuffle happened bewteen source(A node) and operator(B 
node), the latency value is ```endTime - startTime```. 
   ```startTime``` is produced by source(A taskManager), but ```endTime``` is 
produced by operator(B taskManager), and as we know,```System.nanoTime()``` is 
guaranteed to be safe within a single JVM instance.
   So, it would not be a right way that change ```LatencyStats``` to use 
```System.nanoTime()``` instead.
   
   Come back to this issue 
[FLINK-11887](https://issues.apache.org/jira/browse/FLINK-11887).The original 
way that we get ```startTime``` is use 
```SystemProcessingTimeService#scheduleAtFixedRate``` to accumulate a fixed 
time interval periodicity.
   With time going on, there is no guarantee that startTime and actual time 
don't drift apart.Especially if they are executed on different machines.In my 
cluster environment,I found the startTime is much later than actual time.
   If we change ```LatencyStats``` to use 
```SystemProcessingTimeService#scheduleAtFixedRate``` to acquire 
```endTime```,it will be unable to avoid time drift apart in different nodes.
   In many data center,different linux machines use Network Time Protocol to 
synchronize time. So we use ```System.currentTimeMillis (endTime) - 
System.currentTimeMillis (startTime)``` is a relatively accurate way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to