Hexiaoqiao commented on code in PR #6462:
URL: https://github.com/apache/hadoop/pull/6462#discussion_r1464418233
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534),
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511),
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
+The HDFS RBF functionality has undergone significant enhancements,
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
-Vectored IO API
----------------
+**Feature**
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation
Review Comment:
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation namespaces.
->
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS
Federation balance tool introduces one tool to balance data across different
namespace.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
Review Comment:
This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
->
This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to
AWS SDK for Java V2.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534),
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511),
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
+The HDFS RBF functionality has undergone significant enhancements,
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
-Vectored IO API
----------------
+**Feature**
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation
+namespaces. It uses Distcp to copy data from the source path to the target
path.
-The `PositionedReadable` interface has now added an operation for
-Vectored IO (also known as Scatter/Gather IO):
+**Improvement**
+
+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF:
SQLDelegationTokenSecretManager should use version of tokens updated by other
routers.
+
+The SQLDelegationTokenSecretManager enhances performance by maintaining
processed tokens in memory. However, there is
+a potential issue of router cache inconsistency due to token loading and
renewal. This issue has been addressed by the
+resolution of HDFS-17128.
+
+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF:
SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
+
+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens
from SQL in a memory cache with a short TTL,
+faces an issue where expired tokens are not efficiently cleaned up, leading to
a buildup of expired tokens in the SQL database.
+This issue has been addressed by the resolution of HDFS-17148.
+
+**Others**
+
+Other changes to HDFS RBF include WebUI, command line, and other improvements.
Please refer to the release document.
+
+HDFS EC: Code Enhancements and Bug Fixes
+----------------------------------------
-```java
-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer>
allocate)
-```
+HDFS EC has made code improvements and fixed some bugs.
-All the requested ranges will be retrieved into the supplied byte buffers
-possibly asynchronously,
-possibly in parallel, with results potentially coming in out-of-order.
+Important improvements and bugs are as follows:
-1. The default implementation uses a series of `readFully()` calls, so delivers
- equivalent performance.
-2. The local filesystem uses java native IO calls for higher performance reads
than `readFully()`.
-3. The S3A filesystem issues parallel HTTP GET requests in different threads.
+**Improvement**
-Benchmarking of enhanced Apache ORC and Apache Parquet clients through
`file://` and `s3a://`
-show significant improvements in query performance.
+[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve
performance of decommissioning dn with many ec blocks.
-Further Reading:
-*
[FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
-* [Hadoop Vectored IO: Your Data Just Got
Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
- Apachecon 2022 talk.
+In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The
reason is unlike replication blocks can be replicated
+from any dn which has the same block replication, the ec block have to be
replicated from the decommissioning dn.
+The configurations `dfs.namenode.replication.max-streams` and
`dfs.namenode.replication.max-streams-hard-limit` will limit
+the replication speed, but increase these configurations will create risk to
the whole cluster's network. So it should add a new
+configuration to limit the decommissioning dn, distinguished from the cluster
wide max-streams limit.
-Mapreduce: Manifest Committer for Azure ABFS and google GCS
-----------------------------------------------------------
+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block
reconstruction pending timeout refreshable to increase decommission performance.
Review Comment:
Allow block reconstruction pending timeout refreshable to increase
decommission performance.
->
EC: Allow block reconstruction pending timeout refreshable to increase
decommission performance.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
Review Comment:
Please remove the redundant blank space between 'performance' and
'especially'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]