This is an automated email from the ASF dual-hosted git repository. slfan1989 pushed a commit to branch branch-3.4.0 in repository https://gitbox.apache.org/repos/asf/hadoop.git
The following commit(s) were added to refs/heads/branch-3.4.0 by this push: new 8918c7156dab Update Highlight big features and improvements. 8918c7156dab is described below commit 8918c7156dabe855a1ca982ede87aba1423f2530 Author: Shilun Fan <slfan1...@apache.org> AuthorDate: Mon Jan 15 12:45:40 2024 +0800 Update Highlight big features and improvements. --- hadoop-project/src/site/markdown/index.md.vm | 199 +++++++++++++++++---------- 1 file changed, 127 insertions(+), 72 deletions(-) diff --git a/hadoop-project/src/site/markdown/index.md.vm b/hadoop-project/src/site/markdown/index.md.vm index 76eb9af83531..ad2c46d2f61f 100644 --- a/hadoop-project/src/site/markdown/index.md.vm +++ b/hadoop-project/src/site/markdown/index.md.vm @@ -23,109 +23,164 @@ Overview of Changes Users are encouraged to read the full set of release notes. This page provides an overview of the major changes. -DataNode FsDatasetImpl Fine-Grained Locking via BlockPool +S3A: Upgrade AWS SDK to V2 ---------------------------------------- -[HDFS-15180](https://issues.apache.org/jira/browse/HDFS-15180) Split FsDatasetImpl datasetLock via blockpool to solve the issue of heavy FsDatasetImpl datasetLock -When there are many namespaces in a large cluster. + +[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2 + +The S3A connector now uses the V2 AWS SDK. This is a significant change at the source code level. + +Any applications using the internal extension/override points in the filesystem connector are likely to break. + +Consult the document aws\_sdk\_upgrade for the full details. + +HDFS DataNode Split one FsDatasetImpl lock to volume grain locks +---------------------------------------- + +[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks. + +Throughput is one of the core performance evaluation for DataNode instance. + +However, it does not reach the best performance especially for Federation deploy all the time although there are different improvement, + +because of the global coarse-grain lock. + +These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).) try to split the global coarse-grain lock to + +fine-grain lock which is double level lock for blockpool and volume, to improve the throughput and avoid lock impacts between + +blockpools and volumes. YARN Federation improvements ---------------------------------------- -[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) brings many improvements, including the following: -1. YARN Router now boasts a full implementation of all relevant interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol. -2. Enhanced support for Application cleanup and automatic offline mechanisms for SubCluster are now facilitated by the YARN Router. -3. Code optimization for Router and AMRMProxy was undertaken, coupled with improvements to previously pending functionalities. +[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements. + +We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows: + +1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol. +2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster. +3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities. 4. Audit logs and Metrics for Router received upgrades. 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support. 6. The page function of the router has been enhanced. +7. A set of commands has been added to the Router side for operating on SubClusters and Policies. -Upgrade AWS SDK to V2 +HDFS RBF: Code Enhancements, New Features, and Bug Fixes ---------------------------------------- -[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) -The S3A connector now uses the V2 AWS SDK. This is a significant change at the source code level. -Any applications using the internal extension/override points in the filesystem connector are likely to break. -Consult the document aws\_sdk\_upgrade for the full details. -Azure ABFS: Critical Stream Prefetch Fix +The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature + +improvements, new functionalities, and bug fixes. + +Important features and improvements are as follows: + +**Feature** + +[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation balance tool introduces a new HDFS federation balance tool to balance data across different federation + +namespaces. It uses Distcp to copy data from the source path to the target path. + +**Improvement** + +[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers. + +The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is + +a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the + +resolution of HDFS-17128. + +[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL. + +SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL, + +faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database. + +This issue has been addressed by the resolution of HDFS-17148. + +**Others** + +Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document. + +HDFS EC: Code Enhancements and Bug Fixes ---------------------------------------- -The abfs has a critical bug fix -[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546). -*ABFS. Disable purging list of in-progress reads in abfs stream close().* +HDFS EC has made code improvements and fixed some bugs. + +Important improvements and bugs are as follows: + +**Improvement** + +[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks. + +In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated + +from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn. + +The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit + +the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new + +configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit. + +[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block reconstruction pending timeout refreshable to increase decommission performance. + +In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO + +performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of + +`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking + +pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes. + +In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the + +`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And -All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade -or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0` +the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically. -Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521) -*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests* -for root cause analysis, details on what is affected, and mitigations. +**Bug** +[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication. -Vectored IO API ---------------- +In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason: -[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103). -*High performance vectored read API in Hadoop* +- Enable EC policy, such as RS-6-3-1024k. -The `PositionedReadable` interface has now added an operation for -Vectored IO (also known as Scatter/Gather IO): +- The rack number in this cluster is equal with or less than the replication number(9) -```java -void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate) -``` +- A rack only has one DN, and decommission this DN. -All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously, -possibly in parallel, with results potentially coming in out-of-order. +This issue has been addressed by the resolution of HDFS-16456. -1. The default implementation uses a series of `readFully()` calls, so delivers - equivalent performance. -2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`. -3. The S3A filesystem issues parallel HTTP GET requests in different threads. +[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes. -Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://` -show significant improvements in query performance. +During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between -Further Reading: -* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html). -* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html) - Apachecon 2022 talk. +`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat, -Mapreduce: Manifest Committer for Azure ABFS and google GCS ----------------------------------------------------------- +this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices -The new _Intermediate Manifest Committer_ uses a manifest file -to commit the work of successful task attempts, rather than -renaming directories. -Job commit is matter of reading all the manifests, creating the -destination directories (parallelized) and renaming the files, -again in parallel. +array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect -This is both fast and correct on Azure Storage and Google GCS, -and should be used there instead of the classic v1/v2 file -output committers. +internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica. -It is also safe to use on HDFS, where it should be faster -than the v1 committer. It is however optimized for -cloud storage where list and rename operations are significantly -slower; the benefits may be less. +This issue has been addressed by the resolution of HDFS-17094. -More details are available in the -[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html). -documentation. +[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery. +Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration -HDFS: Dynamic Datanode Reconfiguration --------------------------------------- +parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks -HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457. +being sent to the DataNodes, consequently occupying too much of their memory. -A number of Datanode configuration options can be changed without having to restart -the datanode. This makes it possible to tune deployment configurations without -cluster-wide Datanode Restarts. +This issue has been addressed by the resolution of HDFS-17284. -See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361) -for the list of dynamically reconfigurable attributes. +**Others** +Other improvements and fixes for HDFS EC, Please refer to the release document. Transitive CVE fixes -------------------- @@ -133,8 +188,8 @@ Transitive CVE fixes A lot of dependencies have been upgraded to address recent CVEs. Many of the CVEs were not actually exploitable through the Hadoop so much of this work is just due diligence. -However applications which have all the library is on a class path may -be vulnerable, and the ugprades should also reduce the number of false +However, applications which have all the library is on a class path may +be vulnerable, and the upgrades should also reduce the number of false positives security scanners report. We have not been able to upgrade every single dependency to the latest @@ -170,12 +225,12 @@ can, with care, keep data and computing resources private. 1. Physical cluster: *configure Hadoop security*, usually bonded to the enterprise Kerberos/Active Directory systems. Good. -1. Cloud: transient or persistent single or multiple user/tenant cluster +2. Cloud: transient or persistent single or multiple user/tenant cluster with private VLAN *and security*. Good. Consider [Apache Knox](https://knox.apache.org/) for managing remote access to the cluster. -1. Cloud: transient single user/tenant cluster with private VLAN +3. Cloud: transient single user/tenant cluster with private VLAN *and no security at all*. Requires careful network configuration as this is the sole means of securing the cluster.. --------------------------------------------------------------------- To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org