kaisun2000 commented on a change in pull request #365: Fix RoutingTableProvider 
statePropagationLatency metric reporting bug
URL: https://github.com/apache/helix/pull/365#discussion_r308428467
 
 

 ##########
 File path: 
helix-core/src/main/java/org/apache/helix/common/caches/CurrentStateSnapshot.java
 ##########
 @@ -32,18 +37,32 @@ public CurrentStateSnapshot(final Map<PropertyKey, 
CurrentState> currentStateMap
     if (_updatedStateKeys != null && _prevStateMap != null) {
       // Note if the prev state map is empty, this is the first time refresh.
       // So the update is not considered as "recent" change.
+      int driftCnt = 0; // clock drift count for comparing timestamp
       for (PropertyKey propertyKey : _updatedStateKeys) {
         CurrentState prevState = _prevStateMap.get(propertyKey);
         CurrentState curState = _properties.get(propertyKey);
 
         Map<String, Long> partitionUpdateEndTimes = null;
         for (String partition : curState.getPartitionStateMap().keySet()) {
           long newEndTime = curState.getEndTime(partition);
-          if (prevState == null || prevState.getEndTime(partition) < 
newEndTime) {
+          if (prevState == null
+              || prevState.getEndTime(partition) < newEndTime && 
prevState.getEndTime(partition) != -1) {
             if (partitionUpdateEndTimes == null) {
               partitionUpdateEndTimes = new HashMap<>();
             }
             partitionUpdateEndTimes.put(partition, newEndTime);
+          } else if (prevState != null && prevState.getEndTime(partition) > 
newEndTime) {
+            // This can happen due to clock drift.
+            // updatedStateKeys is the path to resource in an instance config.
+            // Thus, the space of inner loop is Sigma{replica(i) * 
partition(i)}; i over all resources in the cluster
+            // This space can be large. In order not to print two many lines, 
we print first warning for the first case.
+            // If clock drift turns out to be common, we can consider print 
out more logs, or expose an metric.
+            if (driftCnt < 1) {
+              LOG.warn(
+                  "clock drift. partition:" + partition + " curState:" + 
curState.getState(partition) + " prevState: "
+                      + prevState.getState(partition));
+            }
+            driftCnt++;
 
 Review comment:
   Somehow, the previous comment got lost. Yes, driftCnt is meant to limit the 
log to be 1 per invocation of this method. Otherwise, there can be too many 
lines per invocation of this method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to