(hadoop) branch branch-3.4.0 updated: HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) Contributed by Shilun Fan.

slfan1989 Thu, 25 Jan 2024 06:36:58 -0800

This is an automated email from the ASF dual-hosted git repository.

slfan1989 pushed a commit to branch branch-3.4.0
in repository https://gitbox.apache.org/repos/asf/hadoop.git



The following commit(s) were added to refs/heads/branch-3.4.0 by this push:
     new b947ccaef1e5 HADOOP-19039. Hadoop 3.4.0 Highlight big features and 
improvements. (#6462) Contributed by Shilun Fan.
b947ccaef1e5 is described below

commit b947ccaef1e56749163a3fc6831a5682f2c51bef
Author: slfan1989 <55643692+slfan1...@users.noreply.github.com>
AuthorDate: Thu Jan 25 15:42:21 2024 +0800

    HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) 
Contributed by Shilun Fan.
    
    Reviewed-by: He Xiaoqiao <hexiaoq...@apache.org>
    Signed-off-by: Shilun Fan <slfan1...@apache.org>
---
 hadoop-project/src/site/markdown/index.md.vm | 56 ++++------------------------
 1 file changed, 8 insertions(+), 48 deletions(-)

diff --git a/hadoop-project/src/site/markdown/index.md.vm 
b/hadoop-project/src/site/markdown/index.md.vm
index ad2c46d2f61f..f3f9c41deb5c 100644
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@@ -28,11 +28,8 @@ S3A: Upgrade AWS SDK to V2
 
 [HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: 
Upgrade AWS SDK to V2
 
-The S3A connector now uses the V2 AWS SDK.  This is a significant change at 
the source code level.
-
-Any applications using the internal extension/override points in the 
filesystem connector are likely to break.
-
-Consult the document aws\_sdk\_upgrade for the full details.
+This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to 
AWS SDK for Java V2.
+This is a significant change which offers a number of new features including 
the ability to work with Amazon S3 Express One Zone Storage - the new high 
performance, single AZ storage class.
 
 HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
 ----------------------------------------
@@ -40,16 +37,11 @@ HDFS DataNode Split one FsDatasetImpl lock to volume grain 
locks
 [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one 
FsDatasetImpl lock to volume grain locks.
 
 Throughput is one of the core performance evaluation for DataNode instance.
-
-However, it does not reach the best performance  especially for Federation 
deploy all the time although there are different improvement,
-
+However, it does not reach the best performance especially for Federation 
deploy all the time although there are different improvement,
 because of the global coarse-grain lock.
-
-These series issues (include 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), 
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and 
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).) try to split 
the global coarse-grain lock to
-
-fine-grain lock which is double level lock for blockpool and volume, to 
improve the throughput and avoid lock impacts between
-
-blockpools and volumes.
+These series issues (include 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), 
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and 
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double 
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and 
volumes.
 
 YARN Federation improvements
 ----------------------------------------
@@ -57,7 +49,6 @@ YARN Federation improvements
 [YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation 
improvements.
 
 We have enhanced the YARN Federation functionality for improved usability. The 
enhanced features are as follows:
-
 1. YARN Router now boasts a full implementation of all interfaces including 
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and 
RMWebServiceProtocol.
 2. YARN Router support for application cleanup and automatic offline 
mechanisms for subCluster.
 3. Code improvements were undertaken for the Router and AMRMProxy, along with 
enhancements to previously pending functionalities.
@@ -70,33 +61,25 @@ HDFS RBF: Code Enhancements, New Features, and Bug Fixes
 ----------------------------------------
 
 The HDFS RBF functionality has undergone significant enhancements, 
encompassing over 200 commits for feature
-
 improvements, new functionalities, and bug fixes.
-
 Important features and improvements are as follows:
 
 **Feature**
 
-[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation 
balance tool introduces a new HDFS federation balance tool to balance data 
across different federation
-
-namespaces. It uses Distcp to copy data from the source path to the target 
path.
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS Federation 
balance tool introduces one tool to balance data across different namespace.
 
 **Improvement**
 
 [HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: 
SQLDelegationTokenSecretManager should use version of tokens updated by other 
routers.
 
 The SQLDelegationTokenSecretManager enhances performance by maintaining 
processed tokens in memory. However, there is
-
 a potential issue of router cache inconsistency due to token loading and 
renewal. This issue has been addressed by the
-
 resolution of HDFS-17128.
 
 [HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: 
SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
 
 SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens 
from SQL in a memory cache with a short TTL,
-
 faces an issue where expired tokens are not efficiently cleaned up, leading to 
a buildup of expired tokens in the SQL database.
-
 This issue has been addressed by the resolution of HDFS-17148.
 
 **Others**
@@ -115,29 +98,19 @@ Important improvements and bugs are as follows:
 [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve 
performance of decommissioning dn with many ec blocks.
 
 In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The 
reason is unlike replication blocks can be replicated
-
 from any dn which has the same block replication, the ec block have to be 
replicated from the decommissioning dn.
-
 The configurations `dfs.namenode.replication.max-streams` and 
`dfs.namenode.replication.max-streams-hard-limit` will limit
-
 the replication speed, but increase these configurations will create risk to 
the whole cluster's network. So it should add a new
-
 configuration to limit the decommissioning dn, distinguished from the cluster 
wide max-streams limit.
 
-[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block 
reconstruction pending timeout refreshable to increase decommission performance.
+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) EC: Allow block 
reconstruction pending timeout refreshable to increase decommission performance.
 
 In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase 
the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize 
the IO
-
 performance of the decommissioning DN, which has a lot of EC blocks. Besides 
this, we also need to decrease the value of
-
 `dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to 
shorten the interval time for checking
-
 pendingReconstructions. Or the decommissioning node would be idle to wait for 
copy tasks in most of this 5 minutes.
-
 In decommission progress, we may need to reconfigure these 2 parameters 
several times. In 
[HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
-
 `dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured 
dynamically without namenode restart. And
-
 the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to 
be reconfigured dynamically.
 
 **Bug**
@@ -145,35 +118,22 @@ the `dfs.namenode.reconstruction.pending.timeout-sec` 
parameter also need to be
 [HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: 
Decommission a rack with only on dn will fail when the rack number is equal 
with replication.
 
 In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
-
 - Enable EC policy, such as RS-6-3-1024k.
-
 - The rack number in this cluster is equal with or less than the replication 
number(9)
-
 - A rack only has one DN, and decommission this DN.
-
 This issue has been addressed by the resolution of HDFS-16456.
 
 [HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in 
block recovery when there are stale datanodes.
-
 During block recovery, the `RecoveryTaskStriped` in the datanode expects a 
one-to-one correspondence between
-
 `rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are 
stale locations during a NameNode heartbeat,
-
 this correspondence may be disrupted. Specifically, although there are no 
stale locations in `recoveryLocations`, the block indices
-
 array remains complete. This discrepancy causes 
`BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
-
 internal block ID, leading to a failure in the recovery process as the 
corresponding datanode cannot locate the replica.
-
 This issue has been addressed by the resolution of HDFS-17094.
 
 [HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int 
overflow in calculating numEcReplicatedTasks and numReplicationTasks during 
block recovery.
-
 Due to an integer overflow in the calculation of numReplicationTasks or 
numEcReplicatedTasks, the NameNode's configuration
-
 parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take 
effect. This led to an excessive number of tasks
-
 being sent to the DataNodes, consequently occupying too much of their memory.
 
 This issue has been addressed by the resolution of HDFS-17284.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

(hadoop) branch branch-3.4.0 updated: HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) Contributed by Shilun Fan.

Reply via email to