[ 
https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747373#comment-17747373
 ] 

ASF GitHub Bot commented on HDFS-17116:
---------------------------------------

haiyang1987 commented on code in PR #5876:
URL: https://github.com/apache/hadoop/pull/5876#discussion_r1274548707


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java:
##########
@@ -128,7 +129,6 @@ public void testRouterExitSafemode()
 
     assertTrue(router.getSafemodeService().isInSafeMode());
     verifyRouter(RouterServiceState.SAFEMODE);
-

Review Comment:
   Thanks @slfan1989 help me reivew it, i wll update it later.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java:
##########
@@ -141,6 +141,31 @@ public void testRouterExitSafemode()
     verifyRouter(RouterServiceState.RUNNING);
   }
 
+  @Test
+  public void testRouterExitSafemodeResetUpTime()
+      throws InterruptedException, IllegalStateException, IOException {
+
+    Calendar calendar = Calendar.getInstance();
+    // Get the future times, add one day to the current date.
+    calendar.add(Calendar.DAY_OF_MONTH, 1);
+    long timestampAfterOneDay = calendar.getTimeInMillis();
+    router.getSafemodeService().setStartupTime(timestampAfterOneDay);
+
+    assertTrue(router.getSafemodeService().isInSafeMode());
+    verifyRouter(RouterServiceState.SAFEMODE);
+
+    // Wait for initial time in milliseconds
+    long interval =
+        conf.getTimeDuration(DFS_ROUTER_SAFEMODE_EXTENSION,
+            TimeUnit.SECONDS.toMillis(2), TimeUnit.MILLISECONDS) +
+            conf.getTimeDuration(DFS_ROUTER_CACHE_TIME_TO_LIVE_MS,
+                TimeUnit.SECONDS.toMillis(1), TimeUnit.MILLISECONDS) * 2;
+    Thread.sleep(interval);

Review Comment:
   i wll update it later.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java:
##########
@@ -141,6 +141,31 @@ public void testRouterExitSafemode()
     verifyRouter(RouterServiceState.RUNNING);
   }
 
+  @Test
+  public void testRouterExitSafemodeResetUpTime()
+      throws InterruptedException, IllegalStateException, IOException {
+
+    Calendar calendar = Calendar.getInstance();
+    // Get the future times, add one day to the current date.
+    calendar.add(Calendar.DAY_OF_MONTH, 1);
+    long timestampAfterOneDay = calendar.getTimeInMillis();
+    router.getSafemodeService().setStartupTime(timestampAfterOneDay);
+
+    assertTrue(router.getSafemodeService().isInSafeMode());
+    verifyRouter(RouterServiceState.SAFEMODE);
+
+    // Wait for initial time in milliseconds
+    long interval =

Review Comment:
   i wll update it later.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterSafemode.java:
##########
@@ -141,6 +141,31 @@ public void testRouterExitSafemode()
     verifyRouter(RouterServiceState.RUNNING);
   }
 
+  @Test
+  public void testRouterExitSafemodeResetUpTime()
+      throws InterruptedException, IllegalStateException, IOException {
+
+    Calendar calendar = Calendar.getInstance();
+    // Get the future times, add one day to the current date.
+    calendar.add(Calendar.DAY_OF_MONTH, 1);
+    long timestampAfterOneDay = calendar.getTimeInMillis();
+    router.getSafemodeService().setStartupTime(timestampAfterOneDay);
+
+    assertTrue(router.getSafemodeService().isInSafeMode());
+    verifyRouter(RouterServiceState.SAFEMODE);
+
+    // Wait for initial time in milliseconds

Review Comment:
   Thanks @hfutatzhanghb help me reivew it, i wll update it later.





> Reset startupTime and enterSafeModeTime if check time interval is negative 
> during router safe mode exit check
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17116
>                 URL: https://issues.apache.org/jira/browse/HDFS-17116
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>
> The following exceptions occurred in our online environment:
> # After the machine restarts, the system time is abnormal, is a time in the 
> future
> # After starting the router, there is log "safemode exit for 24981702 
> milliseconds...", which has been in the safemode state,
> this is mainly because the startupTime is recorded as the future system time 
> when router is started at this time, and the system time returns to normal 
> soon, resulting in a negative delta,
> at this time, the service can only be restored by restart the router service.
> The relevant logs are:
> {code:java}
> 2023-07-15 03:15:49,276 INFO  ipc.Server xxx
> 2023-07-15 11:21:03,785 INFO  router.DFSRouter (LogAdapter.java:info(51)) 
> [main] - STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting Router
> ...
> 2023-07-15 11:21:51,325 INFO xxx
> 2023-07-15 03:22:00,257 INFO xxx
> 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService 
> (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - 
> Delaying safemode exit for 28761777 milliseconds...
> {code}
> Maybe we can be compatible with this case at the code level, and reset the 
> startupTime and enterSafeModeTime in the case of a negative delta,
> which can ensure that the router service can also exit the safemode state 
> normally after the system time returns to normal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to