[ https://issues.apache.org/jira/browse/HDFS-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013120#comment-18013120 ]
ASF GitHub Bot commented on HDFS-17815: --------------------------------------- hadoop-yetus commented on PR #7845: URL: https://github.com/apache/hadoop/pull/7845#issuecomment-3172879743 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 21m 58s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 46s | | trunk passed | | +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 11s | | trunk passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 20s | | trunk passed | | +1 :green_heart: | javadoc | 1m 14s | | trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 3m 14s | | trunk passed | | +1 :green_heart: | shadedclient | 42m 55s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 9s | | the patch passed | | +1 :green_heart: | compile | 1m 18s | | the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 4s | | the patch passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 1m 4s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 0s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 12s | | the patch passed | | +1 :green_heart: | javadoc | 1m 2s | | the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 3m 13s | | the patch passed | | +1 :green_heart: | shadedclient | 44m 46s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 160m 57s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 337m 24s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.TestRollingUpgrade | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/7845 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d38debdcc74a 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4084cfaa409d19176fa39f289def50783e7105da | | Default Java | Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/6/testReport/ | | Max. process+thread count | 2457 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7845/6/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Fix upload fsimage failure when checkpoint takes a long time > ------------------------------------------------------------ > > Key: HDFS-17815 > URL: https://issues.apache.org/jira/browse/HDFS-17815 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.5.0 > Reporter: caozhiqiang > Assignee: caozhiqiang > Priority: Major > Labels: pull-request-available > > The capacity of Our hdfs federation cluster are more then 500 PB, with one NS > containing over 600 million files. Once checkpoint takes nearly two hours. > We discover checkpoint frequently failures due to fail to put the fsimage to > the active Namenode, leading to repeat checkpoints. We configured > dfs.recent.image.check.enabled=true. After debug, the reason is the standby > NN updates the lastCheckpointTime use the start time of checkpoint, rather > than the end time. In our cluster, the lastCheckpointTime of the standby node > is approximately 80 minutes ahead of the lastCheckpointTime of the active NN. > When the checkpoint interval in standby NN exceeds > dfs.namenode.checkpoint.period, the next checkpoint is performed. Because the > active NN's lastCheckpointTime is later than standby NN's, the interval is > less than dfs.namenode.checkpoint.period, and the putting fsimage is been > rejected, causing the checkpoint to fail and retried. > ANN's log: > {code:java} > 2025-07-31 07:14:29,845 INFO [qtp231311211-8404] > org.apache.hadoop.hdfs.server.namenode.ImageServlet: New txnid cnt is > 126487459, expecting at least 300000000. now is 1753917269845, > lastCheckpointTime is 1753875142580, timeDelta is 42127, expecting period at > least 43200 unless too long since last upload.. {code} > SNN's log: > {code:java} > last checkpoint start time: > 2025-07-30 18:13:08,729 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering > checkpoint because it has been 48047 seconds since the last checkpoint, which > exceeds the configured interval 43200 > last checkpoint end timeļ¼ > 2025-07-30 20:11:51,330 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Checkpoint > finished successfully. > this time checkpoint start time: > 2025-07-31 06:13:51,681 INFO [Standby State Checkpointer] > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering > checkpoint because it has been 43242 seconds since the last checkpoint, which > exceeds the configured interval 43200{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org