[
https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haiyang Hu updated HDFS-17116:
------------------------------
Description:
The following exceptions occurred in our online environment:
# After the machine restarts, the system time is abnormal, is a time in the
future
# After starting the router, there is log "safemode exit for 24981702
milliseconds...", which has been in the safemode state,
this is mainly because the startupTime is recorded as the future system time
when router is started at this time, and the system time returns to normal
soon, resulting in a negative delta,
at this time, the service can only be restored by restart the router service.
The relevant logs are:
{code:java}
2023-07-15 03:15:49,276 INFO ipc.Server xxx
2023-07-15 11:21:03,785 INFO router.DFSRouter (LogAdapter.java:info(51))
[main] - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting Router
...
2023-07-15 11:21:51,325 INFO xxx
2023-07-15 03:22:00,257 INFO xxx
2023-07-15 03:22:29,829 INFO router.RouterSafemodeService
(RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] -
Delaying safemode exit for 28761777 milliseconds...
{code}
Maybe we can be compatible with this case at the code level, and reset the
startupTime and enterSafeModeTime in the case of a negative delta,
which can ensure that the router service can also exit the safemode state
normally after the system time returns to normal.
> Reset startupTime and enterSafeModeTime if check time interval is negative
> during router safe mode exit check
> -------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-17116
> URL: https://issues.apache.org/jira/browse/HDFS-17116
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haiyang Hu
> Assignee: Haiyang Hu
> Priority: Major
>
> The following exceptions occurred in our online environment:
> # After the machine restarts, the system time is abnormal, is a time in the
> future
> # After starting the router, there is log "safemode exit for 24981702
> milliseconds...", which has been in the safemode state,
> this is mainly because the startupTime is recorded as the future system time
> when router is started at this time, and the system time returns to normal
> soon, resulting in a negative delta,
> at this time, the service can only be restored by restart the router service.
> The relevant logs are:
> {code:java}
> 2023-07-15 03:15:49,276 INFO ipc.Server xxx
> 2023-07-15 11:21:03,785 INFO router.DFSRouter (LogAdapter.java:info(51))
> [main] - STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting Router
> ...
> 2023-07-15 11:21:51,325 INFO xxx
> 2023-07-15 03:22:00,257 INFO xxx
> 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService
> (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] -
> Delaying safemode exit for 28761777 milliseconds...
> {code}
> Maybe we can be compatible with this case at the code level, and reset the
> startupTime and enterSafeModeTime in the case of a negative delta,
> which can ensure that the router service can also exit the safemode state
> normally after the system time returns to normal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]