[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Steve Loughran (JIRA) Sat, 09 Jun 2018 12:37:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507132#comment-16507132
 ]


Steve Loughran commented on HADOOP-15407:
-----------------------------------------

followup: got this working from the hadoop fs command, though we need to 
understand/sort out the packaging there

* the ASF distro doesn't put the hadoop-azure connector & deps into 
hadoop-common, but into hadoop-tools lib, and that doesn't get onto the CP for 
the {{hadoop fs}} command. Though the {{hadoop fs s3a://}} operations do work 
out the box. I need to understand more of what goes on there.
* even with the hadoop-azure, azure-sdk & htrace artifacts copied from hadoop 
tools to hadoop common lib I got a CNFE for htrace

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/htrace/fasterxml/jackson/core/JsonProcessingException
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
        at java.lang.Class.getDeclaredConstructors(Class.java:2020)
        at 
com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
        at 
com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
        at 
com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:658)
        at 
com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:882)
        at 
com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:805)
        at 
com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:282)
        at 
com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:214)
        at 
com.google.inject.internal.InjectorImpl.getInternalFactory(InjectorImpl.java:890)
        at com.google.inject.internal.FactoryProxy.notify(FactoryProxy.java:46)
        at 
com.google.inject.internal.ProcessedBindingData.runCreationListeners(ProcessedBindingData.java:50)
        at 
com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:134)
        at 
com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
        at com.google.inject.Guice.createInjector(Guice.java:96)
        at com.google.inject.Guice.createInjector(Guice.java:73)
        at com.google.inject.Guice.createInjector(Guice.java:62)
        at 
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.<init>(ServiceProviderImpl.java:43)
        at 
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.create(ServiceProviderImpl.java:60)
        at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:102)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
        at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
Caused by: java.lang.ClassNotFoundException: 
org.apache.htrace.fasterxml.jackson.core.JsonProcessingException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
{code}

Had to  go 
{code}
cp ./share/hadoop/yarn/timelineservice/lib/htrace-core-3.1.0-incubating.jar 
share/hadoop/common/lib
{code}

I don't get this as the dependencies are set up (its a 'compile' dep), so not 
sure why it isn't ending up in hadoop tools lib. Again, clearly something 
packaging related.

I'm not sure how to test this stuff except manually; we don't have any 
integration tests of the actual packaged code (yet), though its probably 
possible via some new test suite in the hadoop-dist packaging. Or something 
downstream/nearby. I know all the hadoop -stack-vendors will have some tests 
for this, but its testing their packaging, not the base ASF stuff...

> Support Windows Azure Storage - Blob file system in Hadoop
> ----------------------------------------------------------
>
>                 Key: HADOOP-15407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15407
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/azure
>    Affects Versions: 3.2.0
>            Reporter: Esfandiar Manii
>            Assignee: Esfandiar Manii
>            Priority: Major
>         Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-patch-atop-patch-007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}·         Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}·         Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}·         Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}·         Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

Reply via email to