[
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449125#comment-16449125
]
Devaraj Das edited comment on HADOOP-15407 at 4/24/18 12:58 AM:
----------------------------------------------------------------
[~esmanii], the patch seems to have been generated incorrectly. I'd expect this
jira to add a lot of new code, but the patch does otherwise :)
was (Author: devaraj):
[~esmanii], the patch seems to have been generated incorrectly. I'd expect this
jira is adding lot of new code, but the patch does otherwise :)
> Support Windows Azure Storage - Blob file system in Hadoop
> ----------------------------------------------------------
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/azure
> Affects Versions: 3.2.0
> Reporter: Esfandiar Manii
> Assignee: Esfandiar Manii
> Priority: Major
> Attachments: HADOOP-15407-001.patch
>
>
> *{color:#212121}Description{color}*
> This JIRA adds a new file system implementation, ABFS, for running Big Data
> and Analytics workloads against Azure Storage. This is a complete rewrite of
> the previous WASB driver with a heavy focus on optimizing both performance
> and cost.
> {color:#212121} {color}
> *{color:#212121}High level design{color}*
> At a high level, the code here extends the FileSystem class to provide an
> implementation for accessing blobs in Azure Storage. The scheme abfs is used
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following
> URI scheme is used to address individual paths:
> {color:#212121} {color}
>
> {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color}
> {color:#212121} {color}
> {color:#212121}ABFS is intended as a replacement to WASB. WASB is not
> deprecated but is in pure maintenance mode and customers should upgrade to
> ABFS once it hits General Availability later in CY18.{color}
> {color:#212121}Benefits of ABFS include:{color}
> {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big
> Data and Analytics workloads by allowing higher limits on storage
> accounts{color}
> {color:#212121}· Removing any ramp up time with Storage backend
> partitioning; blocks are now automatically sharded across partitions in the
> Storage backend{color}
> {color:#212121} . This avoids the need for using
> temporary/intermediate files, increasing the cost (and framework complexity
> around committing jobs/tasks){color}
> {color:#212121}· Enabling much higher read and write throughput on
> single files (tens of Gbps by default){color}
> {color:#212121}· Still retaining all of the Azure Blob features
> customers are familiar with and expect, and gaining the benefits of future
> Blob features as well{color}
> {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the
> file system throughput and operations. Ambari metrics are not currently
> implemented for ABFS, but will be available soon.{color}
> {color:#212121} {color}
> *{color:#212121}Credits and history{color}*
> Credit for this work goes to (hope I don't forget anyone): Shane Mainali,
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant,
> and James Baker. {color}
> {color:#212121} {color}
> *Test*
> ABFS has gone through many test procedures including Hadoop file system
> contract tests, unit testing, functional testing, and manual testing. All the
> Junit tests provided with the driver are capable of running in both
> sequential/parallel fashion in order to reduce the testing time.
> {color:#212121}Besides unit tests, we have used ABFS as the default file
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a
> storage option. (HDFS is also used but not as default file system.) Various
> different customer and test workloads have been run against clusters with
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS,
> Spark Streaming and Spark SQL, and others have been run to do scenario,
> performance, and functional testing. Third parties and customers have also
> done various testing of ABFS.{color}
> {color:#212121}The current version reflects to the version of the code
> tested and used in our production environment.{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]