[ https://issues.apache.org/jira/browse/HDFS-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011299#comment-18011299 ]
ASF GitHub Bot commented on HDFS-17815: --------------------------------------- hadoop-yetus commented on PR #7845: URL: https://github.com/apache/hadoop/pull/7845#issuecomment-3141534703 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 57s | | trunk passed | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 11s | | trunk passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 19s | | trunk passed | | +1 :green_heart: | javadoc | 1m 15s | | trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 3m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 43m 29s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 6s | | the patch passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 1m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 2s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 11s | | the patch passed | | +1 :green_heart: | javadoc | 1m 2s | | the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 3m 12s | | the patch passed | | +1 :green_heart: | shadedclient | 43m 11s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 148m 18s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 306m 59s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7845 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 36c83d3474c2 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2dc4048a1e6048882c954b20b061fc4cd0ad7327 | | Default Java | Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/1/testReport/ | | Max. process+thread count | 2279 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Fix upload fsimage failure when checkpoint takes a long time > ------------------------------------------------------------ > > Key: HDFS-17815 > URL: https://issues.apache.org/jira/browse/HDFS-17815 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.5.0 > Reporter: caozhiqiang > Assignee: caozhiqiang > Priority: Major > Labels: pull-request-available > > The capacity of Our hdfs federation cluster are more then 500 PB, with one NS > containing over 600 million files. Once checkpoint takes nearly two hours. > We discover checkpoint frequently failures due to fail to put the fsimage to > the active Namenode, leading to repeat checkpoints. We configured > dfs.recent.image.check.enabled=true. After debug, the reason is the standby > NN updates the lastCheckpointTime use the start time of checkpoint, rather > than the end time. In our cluster, the lastCheckpointTime of the standby node > is approximately 80 minutes ahead of the lastCheckpointTime of the active NN. > When the checkpoint interval in standby NN exceeds > dfs.namenode.checkpoint.period, the next checkpoint is performed. Because the > active NN's lastCheckpointTime is later than standby NN's, the interval is > less than dfs.namenode.checkpoint.period, and the putting fsimage is been > rejected, causing the checkpoint to fail and retried. > ANN's log: > {code:java} > 2025-07-31 07:14:29,845 INFO [qtp231311211-8404] > org.apache.hadoop.hdfs.server.namenode.ImageServlet: New txnid cnt is > 126487459, expecting at least 300000000. now is 1753917269845, > lastCheckpointTime is 1753875142580, timeDelta is 42127, expecting period at > least 43200 unless too long since last upload.. {code} > SNN's log: > {code:java} > last checkpoint start time: > 2025-07-30 18:13:08,729 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering > checkpoint because it has been 48047 seconds since the last checkpoint, which > exceeds the configured interval 43200 > last checkpoint end timeļ¼ > 2025-07-30 20:11:51,330 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Checkpoint > finished successfully. > this time checkpoint start time: > 2025-07-31 06:13:51,681 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering > checkpoint because it has been 43242 seconds since the last checkpoint, which > exceeds the configured interval 43200{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org