[
https://issues.apache.org/jira/browse/HADOOP-19039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810239#comment-17810239
]
ASF GitHub Bot commented on HADOOP-19039:
-----------------------------------------
Hexiaoqiao commented on code in PR #6462:
URL: https://github.com/apache/hadoop/pull/6462#discussion_r1464418233
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534),
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511),
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
+The HDFS RBF functionality has undergone significant enhancements,
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
-Vectored IO API
----------------
+**Feature**
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation
Review Comment:
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation namespaces.
->
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS
Federation balance tool introduces one tool to balance data across different
namespace.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
Review Comment:
This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
->
This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to
AWS SDK for Java V2.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
+because of the global coarse-grain lock.
+These series issues (include
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534),
[HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511),
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and
[HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
+try to split the global coarse-grain lock to fine-grain lock which is double
level lock for blockpool and volume,
+to improve the throughput and avoid lock impacts between blockpools and
volumes.
+
+YARN Federation improvements
+----------------------------------------
+
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation
improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The
enhanced features are as follows:
+1. YARN Router now boasts a full implementation of all interfaces including
the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and
RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline
mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with
enhancements to previously pending functionalities.
+4. Audit logs and Metrics for Router received upgrades.
+5. A boost in cluster security features was achieved, with the inclusion of
Kerberos support.
+6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on
SubClusters and Policies.
+
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
+----------------------------------------
+The HDFS RBF functionality has undergone significant enhancements,
encompassing over 200 commits for feature
+improvements, new functionalities, and bug fixes.
+Important features and improvements are as follows:
-Vectored IO API
----------------
+**Feature**
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation
balance tool introduces a new HDFS federation balance tool to balance data
across different federation
+namespaces. It uses Distcp to copy data from the source path to the target
path.
-The `PositionedReadable` interface has now added an operation for
-Vectored IO (also known as Scatter/Gather IO):
+**Improvement**
+
+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF:
SQLDelegationTokenSecretManager should use version of tokens updated by other
routers.
+
+The SQLDelegationTokenSecretManager enhances performance by maintaining
processed tokens in memory. However, there is
+a potential issue of router cache inconsistency due to token loading and
renewal. This issue has been addressed by the
+resolution of HDFS-17128.
+
+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF:
SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
+
+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens
from SQL in a memory cache with a short TTL,
+faces an issue where expired tokens are not efficiently cleaned up, leading to
a buildup of expired tokens in the SQL database.
+This issue has been addressed by the resolution of HDFS-17148.
+
+**Others**
+
+Other changes to HDFS RBF include WebUI, command line, and other improvements.
Please refer to the release document.
+
+HDFS EC: Code Enhancements and Bug Fixes
+----------------------------------------
-```java
-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer>
allocate)
-```
+HDFS EC has made code improvements and fixed some bugs.
-All the requested ranges will be retrieved into the supplied byte buffers
-possibly asynchronously,
-possibly in parallel, with results potentially coming in out-of-order.
+Important improvements and bugs are as follows:
-1. The default implementation uses a series of `readFully()` calls, so delivers
- equivalent performance.
-2. The local filesystem uses java native IO calls for higher performance reads
than `readFully()`.
-3. The S3A filesystem issues parallel HTTP GET requests in different threads.
+**Improvement**
-Benchmarking of enhanced Apache ORC and Apache Parquet clients through
`file://` and `s3a://`
-show significant improvements in query performance.
+[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve
performance of decommissioning dn with many ec blocks.
-Further Reading:
-*
[FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
-* [Hadoop Vectored IO: Your Data Just Got
Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
- Apachecon 2022 talk.
+In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The
reason is unlike replication blocks can be replicated
+from any dn which has the same block replication, the ec block have to be
replicated from the decommissioning dn.
+The configurations `dfs.namenode.replication.max-streams` and
`dfs.namenode.replication.max-streams-hard-limit` will limit
+the replication speed, but increase these configurations will create risk to
the whole cluster's network. So it should add a new
+configuration to limit the decommissioning dn, distinguished from the cluster
wide max-streams limit.
-Mapreduce: Manifest Committer for Azure ABFS and google GCS
-----------------------------------------------------------
+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block
reconstruction pending timeout refreshable to increase decommission performance.
Review Comment:
Allow block reconstruction pending timeout refreshable to increase
decommission performance.
->
EC: Allow block reconstruction pending timeout refreshable to increase
decommission performance.
##########
hadoop-project/src/site/markdown/index.md.vm:
##########
@@ -15,103 +15,143 @@
Apache Hadoop ${project.version}
================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release
branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release
branch.
Overview of Changes
===================
Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
----------------------------------------
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A:
Upgrade AWS SDK to V2
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This release of Hadoop moves the S3A connector to Amazon S3 to the V2 SDK.
+This is a significant change which offers a number of new features including
the ability to work with Amazon S3 Express One Zone Storage - the new high
performance, single AZ storage class.
-Consult the parent JIRA
[HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one
FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+However, it does not reach the best performance especially for Federation
deploy all the time although there are different improvement,
Review Comment:
Please remove the redundant blank space between 'performance' and
'especially'.
> Hadoop 3.4.0 Highlight big features and improvements.
> -----------------------------------------------------
>
> Key: HADOOP-19039
> URL: https://issues.apache.org/jira/browse/HADOOP-19039
> Project: Hadoop Common
> Issue Type: Improvement
> Components: common
> Affects Versions: 3.4.0
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Major
> Labels: pull-request-available
>
> While preparing for the release of Hadoop-3.4.0, I've noticed the inclusion
> of numerous commits in this version. Therefore, highlighting significant
> features and improvements becomes crucial. I've completed the initial
> version and now seek the review of more experienced partner to ensure the
> finalization of the version's highlights.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]