Re: [PR] HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. [hadoop]

via GitHub Tue, 23 Jan 2024 23:30:53 -0800


Hexiaoqiao commented on code in PR #6462:
URL: https://github.com/apache/hadoop/pull/6462#discussion_r1464418233



##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
 Apache Hadoop ${project.version}
 ================================
 
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release 
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release 
branch.
 
 Overview of Changes
 ===================
 
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
 
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: 
Upgrade AWS SDK to V2
 
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including 
the ability to work with Amazon S3 Express One Zone Storage - the new high 
performance, single AZ storage class.
 
-Consult the parent JIRA 
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one 
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance  especially for Federation 
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), 
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and 
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double 
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and 
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation 
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The 
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including 
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and 
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline 
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with 
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of 
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on 
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
 
+The HDFS RBF functionality has undergone significant enhancements, 
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
 
-Vectored IO API
----------------
+**Feature**
 
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation 
balance tool introduces a new HDFS federation balance tool to balance data 
across different federation

Review Comment:
   [HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation 
balance tool introduces a new HDFS federation balance tool to balance data 
across different federation namespaces.
   ->
   [HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS 
Federation balance tool introduces one tool to balance data across different 
namespace.



##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
 Apache Hadoop ${project.version}
 ================================
 
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release 
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release 
branch.
 
 Overview of Changes
 ===================
 
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
 
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: 
Upgrade AWS SDK to V2
 
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.

Review Comment:
   This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
   ->
   This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to 
AWS SDK for Java V2.



##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
 Apache Hadoop ${project.version}
 ================================
 
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release 
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release 
branch.
 
 Overview of Changes
 ===================
 
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
 
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: 
Upgrade AWS SDK to V2
 
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including 
the ability to work with Amazon S3 Express One Zone Storage - the new high 
performance, single AZ storage class.
 
-Consult the parent JIRA 
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one 
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance  especially for Federation 
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), 
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and 
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double 
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and 
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation 
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The 
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including 
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and 
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline 
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with 
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of 
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on 
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
 
+The HDFS RBF functionality has undergone significant enhancements, 
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
 
-Vectored IO API
----------------
+**Feature**
 
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation 
balance tool introduces a new HDFS federation balance tool to balance data 
across different federation
+namespaces. It uses Distcp to copy data from the source path to the target 
path.
 
-The `PositionedReadable` interface has now added an operation for
-Vectored IO (also known as Scatter/Gather IO):
+**Improvement**
+
+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: 
SQLDelegationTokenSecretManager should use version of tokens updated by other 
routers.
+
+The SQLDelegationTokenSecretManager enhances performance by maintaining 
processed tokens in memory. However, there is
+a potential issue of router cache inconsistency due to token loading and 
renewal. This issue has been addressed by the
+resolution of HDFS-17128.
+
+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: 
SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
+
+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens 
from SQL in a memory cache with a short TTL,
+faces an issue where expired tokens are not efficiently cleaned up, leading to 
a buildup of expired tokens in the SQL database.
+This issue has been addressed by the resolution of HDFS-17148.
+
+**Others**
+
+Other changes to HDFS RBF include WebUI, command line, and other improvements. 
Please refer to the release document.
+
+HDFS EC: Code Enhancements and Bug Fixes
+----------------------------------------
 
-```java
-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> 
allocate)
-```
+HDFS EC has made code improvements and fixed some bugs.
 
-All the requested ranges will be retrieved into the supplied byte buffers 
-possibly asynchronously,
-possibly in parallel, with results potentially coming in out-of-order.
+Important improvements and bugs are as follows:
 
-1. The default implementation uses a series of `readFully()` calls, so delivers
-   equivalent performance.
-2. The local filesystem uses java native IO calls for higher performance reads 
than `readFully()`.
-3. The S3A filesystem issues parallel HTTP GET requests in different threads.
+**Improvement**
 
-Benchmarking of enhanced Apache ORC and Apache Parquet clients through 
`file://` and `s3a://`
-show significant improvements in query performance.
+[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve 
performance of decommissioning dn with many ec blocks.
 
-Further Reading:
-* 
[FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
-* [Hadoop Vectored IO: Your Data Just Got 
Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
-  Apachecon 2022 talk.
+In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The 
reason is unlike replication blocks can be replicated
+from any dn which has the same block replication, the ec block have to be 
replicated from the decommissioning dn.
+The configurations `dfs.namenode.replication.max-streams` and 
`dfs.namenode.replication.max-streams-hard-limit` will limit
+the replication speed, but increase these configurations will create risk to 
the whole cluster's network. So it should add a new
+configuration to limit the decommissioning dn, distinguished from the cluster 
wide max-streams limit.
 
-Mapreduce: Manifest Committer for Azure ABFS and google GCS
-----------------------------------------------------------
+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block 
reconstruction pending timeout refreshable to increase decommission performance.

Review Comment:
   Allow block reconstruction pending timeout refreshable to increase 
decommission performance.
   ->
   EC: Allow block reconstruction pending timeout refreshable to increase 
decommission performance.



##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
 Apache Hadoop ${project.version}
 ================================
 
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release 
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release 
branch.
 
 Overview of Changes
 ===================
 
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
 
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: 
Upgrade AWS SDK to V2
 
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including 
the ability to work with Amazon S3 Express One Zone Storage - the new high 
performance, single AZ storage class.
 
-Consult the parent JIRA 
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one 
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance  especially for Federation 
deploy all the time although there are different improvement,

Review Comment:
   Please remove the redundant blank space between 'performance' and 
'especially'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. [hadoop]

Reply via email to