[
https://issues.apache.org/jira/browse/HDFS-17166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759134#comment-17759134
]
ASF GitHub Bot commented on HDFS-17166:
---------------------------------------
hadoop-yetus commented on PR #5990:
URL: https://github.com/apache/hadoop/pull/5990#issuecomment-1693804418
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 35s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 3 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 33m 31s | | trunk passed |
| +1 :green_heart: | compile | 0m 33s | | trunk passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | compile | 0m 30s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | checkstyle | 0m 26s | | trunk passed |
| +1 :green_heart: | mvnsite | 0m 35s | | trunk passed |
| +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 26s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 1m 2s | | trunk passed |
| +1 :green_heart: | shadedclient | 22m 13s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 0m 24s | | the patch passed |
| +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javac | 0m 25s | | the patch passed |
| +1 :green_heart: | compile | 0m 22s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | javac | 0m 22s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 0m 15s |
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5990/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
| hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 5 new + 1
unchanged - 0 fixed = 6 total (was 1) |
| +1 :green_heart: | mvnsite | 0m 24s | | the patch passed |
| +1 :green_heart: | javadoc | 0m 22s | | the patch passed with JDK
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 0m 55s | | the patch passed |
| +1 :green_heart: | shadedclient | 23m 26s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| -1 :x: | unit | 29m 32s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5990/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
| hadoop-hdfs-rbf in the patch passed. |
| +1 :green_heart: | asflicense | 0m 38s | | The patch does not
generate ASF License warnings. |
| | | 120m 36s | | |
| Reason | Tests |
|-------:|:------|
| Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRpc |
| | hadoop.hdfs.server.federation.router.TestRouterQuota |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5990/1/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/5990 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 4bac7adc94c4 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / da69e7f403c95c8daf7305eea60432f4fc57d9bb |
| Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5990/1/testReport/ |
| Max. process+thread count | 2646 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U:
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5990/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> RBF: Throwing NoNamenodesAvailableException for a long time, when failover
> --------------------------------------------------------------------------
>
> Key: HDFS-17166
> URL: https://issues.apache.org/jira/browse/HDFS-17166
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Jian Zhang
> Priority: Major
> Labels: pull-request-available
> Attachments:
> fix_NoNamenodesAvailableException_long_time_when_ns_failover.patch,
> image-2023-08-26-00-24-02-016.png, image-2023-08-26-00-25-42-086.png
>
>
> When ns failover, the router may record that the ns have no active namenode,
> the router cannot find the active nn in the ns for about 1 minute. The client
> will report an error after consuming the number of retries, and the router
> will be unable to provide services for the ns for a long time.
> 11:52:44 Start reporting
> !image-2023-08-26-00-24-02-016.png|width=800,height=100!
> 11:53:46 end reporting
> !image-2023-08-26-00-25-42-086.png|width=800,height=50!
>
> At this point, the failover has been successfully completed in the ns, and
> the client can directly connect to the active namenode to access it
> successfully, but the client cannot access the ns through router for up to a
> minute
>
> *There is a bug in this logic:*
> * A certain ns starts to fail over,
> * There is a state where there is no active nn in ns, Router reports the
> status (no active nn) to the state store
> * After a period of time, the router pulls the state store data to update the
> cache, and the cache records that the ns has no active nn
> * Failover successfully completed, at which point the ns actually has an
> active nn
> * Assuming it's not time for router to update the cache yet
> * The client sent a request to the router for the ns, and the router
> accessed the first nn of the ns in the router’s cache (no active nn)
> * Unfortunately, the nn is really standby, so the request went wrong and
> entered the exception handling logic. The router found that there is no
> active nn for the ns in the cache and throw NoNamenodesAvailableException
> * The NoNamenodesAvailableException exception is wrapped as a
> RetrieveException, which causes the client to retry. Since each router
> retrieves the true standby nn in the cache (because it is always the first
> one in the cache and has a high priority), a NoNamenodesAvailableException is
> thrown every time until the router updates the cache from the state store
>
> *Fix the bug*
> When an ns in the router's cache does not have an active nn, but in reality,
> the ns has an active nn, and the client requests to throw a
> NoNamenodesAvailableException, it is proven that the requested nn is a real
> standby nn. The priority of this nn should be lowered so that the next
> request will find the real active nn, avoiding constantly requesting the real
> standby nn, which will cause the cache to be updated before the next time,
> The router is unable to provide services for the ns to the client.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]