[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Sean Mackrory (JIRA) Tue, 18 Sep 2018 07:10:14 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619161#comment-16619161
 ]


Sean Mackrory commented on HADOOP-15407:
----------------------------------------

I'm comfortable with that. The changes outside of the module (and even outside 
of the new package in the module) are very few in number and they're quite 
trivial. I've done some testing with MapReduce, Hive and Spark (not on the 
exact current branch state, but recently with most of it) and all issues found 
have been fixed.

{quote}However vote + merge needs another week time as per my 
understanding{quote}

According to the by-laws (https://hadoop.apache.org/bylaws.html), we can merge 
as soon as we have 3 +1's from active committers and there's consensus. In this 
case, the 7 day convention is a courtesy. Once we have 3 +1's, if we'd like to 
wrap it up I think it's reasonable to ask if anybody would like more time (like 
the weekend) to evaluate it and consider consensus reached if nobody says so.

{quote}
Consensus approval of active committers, but with a minimum of one +1. The code 
can be committed after the first +1, unless the code change represents a merge 
from a branch, in which case three +1s are required.
..
Votes relating to code changes are not subject to a strict timetable but should 
be made as timely as possible.
{quote}


> Support Windows Azure Storage - Blob file system in Hadoop
> ----------------------------------------------------------
>
>                 Key: HADOOP-15407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15407
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/azure
>    Affects Versions: 3.2.0
>            Reporter: Esfandiar Manii
>            Assignee: Da Zhou
>            Priority: Blocker
>         Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}·         Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}·         Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}·         Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}·         Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Reply via email to