[
https://issues.apache.org/jira/browse/HDFS-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048837#comment-18048837
]
ASF GitHub Bot commented on HDFS-17867:
---------------------------------------
hadoop-yetus commented on PR #8154:
URL: https://github.com/apache/hadoop/pull/8154#issuecomment-3707013400
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 44s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +0 :ok: | xmllint | 0m 0s | | xmllint was not available. |
| +1 :green_heart: | @author | 0m 1s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 5 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 21m 55s | | trunk passed |
| +1 :green_heart: | compile | 0m 48s | | trunk passed with JDK
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | compile | 0m 48s | | trunk passed with JDK
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | checkstyle | 0m 31s | | trunk passed |
| +1 :green_heart: | mvnsite | 0m 52s | | trunk passed |
| +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | spotbugs | 2m 1s | | trunk passed |
| +1 :green_heart: | shadedclient | 15m 24s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 0m 43s | | the patch passed |
| +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javac | 0m 38s | | the patch passed |
| +1 :green_heart: | compile | 0m 41s | | the patch passed with JDK
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javac | 0m 41s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| -0 :warning: | checkstyle | 0m 25s |
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 252 unchanged
- 0 fixed = 253 total (was 252) |
| +1 :green_heart: | mvnsite | 0m 47s | | the patch passed |
| -1 :x: | javadoc | 0m 36s |
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
| hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04
with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 23 new + 9977 unchanged
- 23 fixed = 10000 total (was 10000) |
| -1 :x: | javadoc | 0m 35s |
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
| hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04
with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 21 new + 9676
unchanged - 0 fixed = 9697 total (was 9676) |
| +1 :green_heart: | spotbugs | 1m 58s | | the patch passed |
| +1 :green_heart: | shadedclient | 15m 25s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| -1 :x: | unit | 172m 13s |
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
| hadoop-hdfs in the patch passed. |
| +1 :green_heart: | asflicense | 0m 27s | | The patch does not
generate ASF License warnings. |
| | | 238m 18s | | |
| Reason | Tests |
|-------:|:------|
| Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.52 ServerAPI=1.52 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/8154 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
| uname | Linux 050d0a3b7fd4 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14
20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 7cd4d36debfdb10dee7aa9937637da65431efffa |
| Default Java | Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
| Multi-JDK versions |
/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
/usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/testReport/ |
| Max. process+thread count | 4601 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8154/3/console |
| versions | git=2.25.1 maven=3.9.11 spotbugs=4.9.7 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> Implement a new NetworkTopology that supports weighted random choose
> --------------------------------------------------------------------
>
> Key: HDFS-17867
> URL: https://issues.apache.org/jira/browse/HDFS-17867
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: khazhen
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> In BlockPlacementPolicyDefault, each DN in the cluster is selected with
> roughly equal probability. However, in our cluster, there are various types
> of DataNode machines with completely different hardware specifications.
> For example, some machines have more disks, higher-bandwidth NIC,
> higher-performance CPUs, etc., while some older machines are the opposite.
> Their service capacity is much lower than other newer machines. Therefore, as
> the cluster load increases, these lower-performance machines immediately
> become bottlenecks, causing the cluster's performance to decline, or even
> affecting availability (such as slow data nodes or pipeline recovery
> failures).
> The root cause of this problem is that we don't have a way to adjust
> the load between different datanodes.
> h2. Solution
> To better solve this problem, we implemented a NetworkTopology that
> support weighted random choose.
> We can configure a weight value for each DN similar to how we configure
> racks. For clusters containing DNs with different hardware specifications,
> introducing this feature has several benefits:
> # Better load balancing between DNs. High-performance machines can handle
> more traffic, and the overall service capacity of the cluster will be
> improved.
> # Higher resource utilization.
> # Reduced overhead from Balancer. Typically, higher-performance machines
> mean more hard drives and larger capacity. If we configure weights according
> to capacity ratios, the amount of data that needs to be moved by Balancer
> will be significantly reduced. (Of course, Balancer is still needed when new
> dn is added)
> Our production cluster has many different types of hardware
> specifications for DN machines, and some machines can have capacity up to 10
> times that of some older machines. Additionally, some machines are
> co-deployed with many other services, causing them to immediately become slow
> nodes once load increases.
> After introducing this feature, we make independently-deployed,
> higher-performance, larger-capacity machines handle more traffic. Both the
> overall IO performance and availability of the cluster have been
> significantly improved.
> Our cluster's Hadoop version is still at 2.x, so we directly extend the
> NetworkTopology class to implement this feature. However, in the latest
> hadoop, DFSNetworkTopology has been introduced as the default implementation.
> Therefore, I attempted to re-implement this feature based on
> DFSNetworkTopology. I will introduce the details next.
> h2. Implementation
> Let's have a look at the chooseRandomWithStorageType method of
> DFSNetworkTopology. Consider we have 3 dn in the cluster: dn1(/r1), dn2(/r1),
> dn3(/r2). The topology tree looks like this:
> {code:java}
> /
> /r1
> /dn1
> /dn2
> /r2
> /dn3 {code}
> There are 3 core steps to choose a random dn from root scope:
> 1. compute num of available nodes under r1 and r2, which is [2, 1] in this
> case.
> 2. perform a weighted random choose from [r1, r2] with weight [2, 1], assume
> r1 is chosen
> 3. as r1 is a rack inner node, randomly choose a dn from its children list
> [dn1, dn2]
> The probability of each of these three dn being chosen is 1/3.
> Now we want to introduce a weighted random choose from [dn1, dn2, dn3]
> with weight [3, 1, 2]. A simple and straightforward solution is to add
> virtual nodes to the topology tree, and the new topology tree looks like this:
> {code:java}
> /
> /r1
> /dn1'
> /dn1'
> /dn1'
> /dn2'
> /r2
> /dn3'
> /dn3' {code}
> The probability of each of these virtual nodes being chosen is 1/6, and
> dn1 has 3 virtual nodes, so the probability of choosing dn1 is 1/2, and 1/6,
> 1/3 for dn2 and dn3 respectively.
> However, upon reviewing steps 1 through 3, we can see that step 1 and 2
> only care about the number of data nodes under inner node, this means that we
> don't need to really add virtual nodes to the topology tree, instead, we can
> introduce a new method getNodeCount(Node n), it accepts a node as input, and
> returns the number of data nodes under n. In the old DFSNetworkTopology
> class, it just returns the number of physical data nodes under n. Then we can
> add a new subclass of DFSNetworkTopology which overrides getNodeCount(Node n)
> to return the total weight of all data nodes under n.
> The step 3 needs to be modified as well, we should perform a weighted
> random choose from child list rather than a simple random choose.
> h2. How to config weight of datanode
> A new interface named DNSToWeightMapping is introduced to map a DNS
> name to a weight. Currently, there is only one implementation:
> TableDataNodeWeightMapping, which is similar to TableMapping. It reads a 2
> column text file. The columns are separated by whitespace. The first column
> is a IP address and the second column specifies the weight where the address
> maps. For example:
> {code:java}
> 1.2.3.4 3
> 2.3.4.5 1
> 3.4.5.6 2{code}
> To enable this feature, you need to:
> # set
> dfs.net.topology.impl=org.apache.hadoop.hdfs.net.DFSNetworkTopologyWithWeight
> # create a text file that contains the weight mapping information
> # set dfs.net.topology.weight.table.file.name=<path to the file>
> h2. Difference with AvailableSpaceBlockPlacementPolicy
> AvailableSpaceBlockPlacementPolicy is useful when we add new nodes to
> the cluster, it makes the new added nodes being chosen with a little high
> possibility than the old ones, and the cluster will trend to be balanced
> after a period of time. The real time load of newly added nodes won't change
> much.
> This feature focuses on the real time load balancing between data
> nodes, it's useful in the cluster that has many different types of data nodes.
> h2. Conclusion
> I have submitted a PR. More suggestions and discussions are welcomed.
> By the way, it is a very useful feature to make the weight of nodes
> reconfigurable without restarting namenode. It allows us to quickly adjust
> weights based on the actual load of the cluster. I will introduce this
> feature in a separate JIRA after this one is completed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]