[
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689481#comment-17689481
]
ASF GitHub Bot commented on HDFS-16918:
---------------------------------------
hadoop-yetus commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1432445119
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 1m 22s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +0 :ok: | xmllint | 0m 0s | | xmllint was not available. |
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 2 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 50m 49s | | trunk passed |
| +1 :green_heart: | compile | 1m 28s | | trunk passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | compile | 1m 24s | | trunk passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | checkstyle | 1m 8s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 29s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | javadoc | 1m 32s | | trunk passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | spotbugs | 3m 35s | | trunk passed |
| +1 :green_heart: | shadedclient | 29m 26s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 30s | | the patch passed |
| +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 |
| +1 :green_heart: | javac | 1m 23s | | the patch passed |
| +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | javac | 1m 13s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 0m 54s |
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 325 unchanged
- 0 fixed = 326 total (was 325) |
| +1 :green_heart: | mvnsite | 1m 23s | | the patch passed |
| -1 :x: | javadoc | 0m 53s |
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
| hadoop-hdfs in the patch failed with JDK
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. |
| +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK
Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| +1 :green_heart: | spotbugs | 3m 29s | | the patch passed |
| +1 :green_heart: | shadedclient | 29m 11s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| -1 :x: | unit | 251m 57s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 0m 43s | | The patch does not
generate ASF License warnings. |
| | | 385m 0s | | |
| Reason | Tests |
|-------:|:------|
| Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
| | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
| | hadoop.hdfs.server.namenode.TestAuditLogs |
| | hadoop.hdfs.server.namenode.TestFsck |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.42 ServerAPI=1.42 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/5396 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
| uname | Linux 5d0f90e11c93 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 3400be46ce4cf29409a2b031a8860a80d61313df |
| Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/testReport/ |
| Max. process+thread count | 2431 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/2/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Optionally shut down datanode if it does not stay connected to active namenode
> ------------------------------------------------------------------------------
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout
> configured at envoy, the network connection issues or packet loss could be
> observed. All of envoys basically form a transparent communication mesh in
> which each app can send and receive packets to and from localhost and is
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to
> applications, in order to identify network issues reliably. However,
> sometimes such proxy based setup could result into socket connection issues
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the
> hadoop daemons are stopped. If a given datanode does not stay connected to
> active namenode in the cluster i.e. does not receive heartbeat response in
> time from active namenode (even though active namenode is not terminated), it
> would not be much useful. We should be able to provide configurable behavior
> such that if a given datanode cannot receive heartbeat response from active
> namenode in configurable time duration, it should terminate itself to avoid
> impacting the availability SLA. This is specifically helpful when the
> underlying deployment or observability framework (e.g. K8S) can start up the
> datanode automatically upon it's shutdown (unless it is being restarted as
> part of rolling upgrade) and help the newly brought up datanode (in case of
> k8s, a new pod with dynamically changing nodes) establish new socket
> connection to active and standby namenodes. This should be an opt-in behavior
> and not default one.
>
> In a distributed system, it is essential to have robust fail-fast mechanisms
> in place to prevent issues related to network partitioning. The system must
> be designed to prevent further degradation of availability and consistency in
> the event of a network partition. Several distributed systems offer fail-safe
> approaches, and for some, partition tolerance is critical to the extent that
> even a few seconds of heartbeat loss can trigger the removal of an
> application server instance from the cluster. For instance, a majority of
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system
> reliable, fault-tolerant and strongly consistent in the event of network
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the
> critical role that active and observer namenode play in file system
> operations. In a large-scale cluster, if the datanodes holding the same block
> (primary and replicas) lose connection to both active and observer namenodes
> for a significant amount of time, delaying the process of shutting down such
> datanodes and restarting it to re-establish the connection with the namenodes
> (assuming the active namenode is alive, assumption is important in the even
> of network partition to reestablish the connection) will further deteriorate
> the availability of the service. This scenario underscores the importance of
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every
> deployment or cluster management application must be able to restart
> datanodes based on JMX metrics, as this would introduce another application
> to resolve the network partition impact of hdfs. Besides, popular cluster
> management applications are not typically used in all cloud-native env. Even
> if these cluster management applications are deployed, certain security
> constraints may restrict their access to JMX metrics and prevent them from
> interfering with hdfs operations. The applications that can only trigger
> alerts for users based on set parameters (for instance, missing blocks > 0)
> are allowed to access JMX metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]