[ 
https://issues.apache.org/jira/browse/HDFS-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679353#comment-17679353
 ] 

ASF GitHub Bot commented on HDFS-16890:
---------------------------------------

simbadzina commented on code in PR #5298:
URL: https://github.com/apache/hadoop/pull/5298#discussion_r1083133346


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java:
##########
@@ -211,13 +222,19 @@ public RouterRpcClient(Configuration conf, Router router,
     this.observerReadEnabledDefault = conf.getBoolean(
         RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY,
         RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_VALUE);
-    String[] observerReadOverrides = 
conf.getStrings(RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES);
+    String[] observerReadOverrides =
+        conf.getStrings(RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES);
     if (observerReadOverrides != null) {
       
observerReadEnabledOverrides.addAll(Arrays.asList(observerReadOverrides));
     }
     if (this.observerReadEnabledDefault) {
       LOG.info("Observer read is enabled for router.");
     }
+    this.activeNNStateIdRefreshPeriodMs = conf.getTimeDuration(
+        RBFConfigKeys.DFS_ROUTER_OBSERVER_STATE_ID_REFRESH_PERIOD_KEY,
+        RBFConfigKeys.DFS_ROUTER_OBSERVER_STATE_ID_REFRESH_PERIOD_DEFAULT,
+        TimeUnit.SECONDS, TimeUnit.MILLISECONDS);
+    this.lastActiveNNRefreshTimes = new HashMap<>();

Review Comment:
   We do need locking. The doc for HashMap says
   
   > If multiple threads access a hash map concurrently, and at least one of 
the threads modifies the map structurally, it must be synchronized externally.
   
   In our case, the map modification is only the creation of the entry. Updates 
to the stateID are handled by LongAccumulator which is thread-safe. However, to 
prevents a potential race and exception during creation of the entry I've 
switched to ConcurrentHashMap.





> RBF: Add period state refresh to keep router state near active namenode's
> -------------------------------------------------------------------------
>
>                 Key: HDFS-16890
>                 URL: https://issues.apache.org/jira/browse/HDFS-16890
>             Project: Hadoop HDFS
>          Issue Type: Task
>            Reporter: Simbarashe Dzinamarira
>            Assignee: Simbarashe Dzinamarira
>            Priority: Major
>              Labels: pull-request-available
>
> When using the ObserverReadProxyProvider, clients can set 
> *dfs.client.failover.observer.auto-msync-period...* to periodically get the 
> Active namenode's state. When using routers without the 
> ObserverReadProxyProvider, this periodic update is lost.
> In a busy cluster, the Router constantly gets updated with the active 
> namenode's state when
>  # There is a write operation.
>  # There is an operation (read/write) from a new clients.
> However, in the scenario when there are no new clients and no write 
> operations, the state kept in the router can lag behind the active's. The 
> router does update its state with responses from the Observer, but the 
> observer may be lagging behind too.
> We should have a periodic refresh in the router to serve a similar role as 
> *dfs.client.failover.observer.auto-msync-period*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to