[
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650104#comment-17650104
]
ASF GitHub Bot commented on HDFS-16689:
---------------------------------------
hadoop-yetus commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1360928620
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 40s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 4 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 38m 41s | | trunk passed |
| +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | compile | 1m 20s | | trunk passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | checkstyle | 1m 6s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 32s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | javadoc | 1m 28s | | trunk passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | spotbugs | 3m 30s | | trunk passed |
| +1 :green_heart: | shadedclient | 22m 47s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 16s | | the patch passed |
| +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | javac | 1m 17s | | the patch passed |
| +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | javac | 1m 12s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 0m 52s | |
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 260 unchanged - 1
fixed = 260 total (was 261) |
| +1 :green_heart: | mvnsite | 1m 23s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | spotbugs | 3m 13s | | the patch passed |
| +1 :green_heart: | shadedclient | 22m 21s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| -1 :x: | unit | 302m 52s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 0m 50s | | The patch does not
generate ASF License warnings. |
| | | 408m 42s | | |
| Reason | Tests |
|-------:|:------|
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/4744 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux c23c319f91b7 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 0d25fbee414a5cf318a3b9b9c831f5ae0aaf7d18 |
| Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/testReport/ |
| Max. process+thread count | 3334 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -----------------------------------------------------------------------------
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X
> when there is a stream available for read: ByteStringEditLog[X, Y],
> ByteStringEditLog[X, 0]
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
> ... 36 more
> {code}
> After tracing and found there is a critical bug in
> *EditlogTailer#catchupDuringFailover()* when
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()*
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*.
> It may cannot replay any edits when they are some abnormal JournalNodes.
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0,
> JN1 and JN2.
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover
> active from NN0 to NN1.
> - NN1 only got two responses from JN0 and JN1 when it try to selecting
> inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with
> *onlyDurableTxns=false* , so that it can replay all missed edits from
> JournalNode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]