This is an automated email from the ASF dual-hosted git repository.

stevel pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/hadoop.git


The following commit(s) were added to refs/heads/trunk by this push:
     new bbb17e76a7a8 HADOOP-19178: [WASB Deprecation] Updating Documentation 
on Upcoming Plans for Hadoop-Azure (#6862)
bbb17e76a7a8 is described below

commit bbb17e76a7a8a995a8b202c9b9530f39bb2a2957
Author: Anuj Modi <128447756+anujmodi2...@users.noreply.github.com>
AuthorDate: Fri Jun 7 18:58:24 2024 +0530

    HADOOP-19178: [WASB Deprecation] Updating Documentation on Upcoming Plans 
for Hadoop-Azure (#6862)
    
    
    Contributed by Anuj Modi
---
 .../hadoop-azure/src/site/markdown/index.md        |  1 +
 .../hadoop-azure/src/site/markdown/wasb.md         | 97 ++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/index.md 
b/hadoop-tools/hadoop-azure/src/site/markdown/index.md
index 595353896d12..177ab282c112 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/index.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/index.md
@@ -18,6 +18,7 @@
 
 See also:
 
+* [WASB](./wasb.html)
 * [ABFS](./abfs.html)
 * [Testing](./testing_azure.html)
 
diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md 
b/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
new file mode 100644
index 000000000000..270fd14da4c4
--- /dev/null
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
@@ -0,0 +1,97 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Hadoop Azure Support: WASB Driver
+
+## Introduction
+WASB Driver is a legacy Hadoop File System driver that was developed to support
+[FNS(FlatNameSpace) Azure Storage 
accounts](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
+that do not honor File-Folder syntax.
+HDFS Folder operations hence are mimicked at client side by WASB driver and
+certain folder operations like Rename and Delete can lead to a lot of IOPs with
+client-side enumeration and orchestration of rename/delete operation blob by 
blob.
+It was not ideal for other APIs too as initial checks for path is a file or 
folder
+needs to be done over multiple metadata calls. These led to a degraded 
performance.
+
+To provide better service to Analytics users, Microsoft released [ADLS 
Gen2](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
+which are HNS (Hierarchical Namespace) enabled, i.e. File-Folder aware storage 
accounts.
+ABFS driver was designed to overcome the inherent deficiencies of WASB and 
users
+were informed to migrate to ABFS driver.
+
+### Challenges and limitations of WASB Driver
+Users of the legacy WASB driver face a number of challenges and limitations:
+1. They cannot leverage the optimizations and benefits of the latest ABFS 
driver.
+2. They need to deal with the compatibility issues should the files and 
folders were
+modified with the legacy WASB driver and the ABFS driver concurrently in a 
phased
+transition situation.
+3. There are differences for supported features for FNS and HNS over ABFS 
Driver.
+4. In certain cases, they must perform a significant amount of re-work on their
+workloads to migrate to the ABFS driver, which is available only on HNS enabled
+accounts in a fully tested and supported scenario.
+
+## Deprecation plans for WASB Driver
+We are introducing a new feature that will enable the ABFS driver to support
+FNS accounts (over BlobEndpoint that WASB Driver uses) using the ABFS scheme.
+This feature will enable us to use the ABFS driver to interact with data 
stored in GPv2
+(General Purpose v2) storage accounts.
+
+With this feature, the users who still use the legacy WASB driver will be able
+to migrate to the ABFS driver without much re-work on their workloads. They 
will
+however need to change the URIs from the WASB scheme to the ABFS scheme.
+
+Once ABFS driver has built FNS support capability to migrate WASB users, WASB
+driver will be marked for removal in next major release. This will remove any 
ambiguity
+for new users onboards as there will be only one Microsoft driver for Azure 
Storage
+and migrating users will get SLA bound support for driver and service,
+which was not guaranteed over WASB.
+
+We anticipate that this feature will serve as a stepping stone for users to
+move to HNS enabled accounts with the ABFS driver, which is our recommended 
stack
+for big data analytics on ADLS Gen2.
+
+### Impact for existing ABFS users using ADLS Gen2 (HNS enabled account)
+This feature does not impact the existing users who are using ADLS Gen2 
Accounts
+(HNS enabled account) with ABFS driver.
+
+They do not need to make any changes to their workloads or configurations. They
+will still enjoy the benefits of HNS, such as atomic operations, fine-grained
+access control, scalability, and performance.
+
+### Official recommendation
+Microsoft continues to recommend all Big Data and Analytics users to use
+Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to 
optimize
+this scenario in the future, we believe that this new option will help all 
those
+users to transition to a supported scenario immediately, while they plan to
+ultimately move to ADLS Gen2 (HNS enabled account).
+
+### New Authentication Options for a migrating user
+Below auth types that WASB provides will continue to work on the new FNS over
+ABFS Driver over configuration that accepts these SAS types (similar to WASB):
+1. SharedKey
+2. Account SAS
+3. Service/Container SAS
+
+Below authentication types that were not supported by WASB driver but 
supported by
+ABFS driver will continue to be available for new FNS over ABFS Driver
+1. OAuth 2.0 Client Credentials
+2. OAuth 2.0: Refresh Token
+3. Azure Managed Identity
+4. Custom OAuth 2.0 Token Provider
+
+Refer to [ABFS Authentication](abfs.html/authentication) for more details.
+
+### ABFS Features Not Available for migrating Users
+Certain features of ABFS Driver will be available only to users using HNS 
accounts with ABFS driver.
+1. ABFS Driver's SAS Token Provider plugin for UserDelegation SAS and Fixed 
SAS.
+2. Client Provided Encryption Key (CPK) support for Data ingress and egress.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

Reply via email to