[
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507132#comment-16507132
]
Steve Loughran commented on HADOOP-15407:
-----------------------------------------
followup: got this working from the hadoop fs command, though we need to
understand/sort out the packaging there
* the ASF distro doesn't put the hadoop-azure connector & deps into
hadoop-common, but into hadoop-tools lib, and that doesn't get onto the CP for
the {{hadoop fs}} command. Though the {{hadoop fs s3a://}} operations do work
out the box. I need to understand more of what goes on there.
* even with the hadoop-azure, azure-sdk & htrace artifacts copied from hadoop
tools to hadoop common lib I got a CNFE for htrace
{code}
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/htrace/fasterxml/jackson/core/JsonProcessingException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getDeclaredConstructors(Class.java:2020)
at
com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
at
com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
at
com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:658)
at
com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:882)
at
com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:805)
at
com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:282)
at
com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:214)
at
com.google.inject.internal.InjectorImpl.getInternalFactory(InjectorImpl.java:890)
at com.google.inject.internal.FactoryProxy.notify(FactoryProxy.java:46)
at
com.google.inject.internal.ProcessedBindingData.runCreationListeners(ProcessedBindingData.java:50)
at
com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:134)
at
com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:96)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.<init>(ServiceProviderImpl.java:43)
at
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.create(ServiceProviderImpl.java:60)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:102)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
at
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
Caused by: java.lang.ClassNotFoundException:
org.apache.htrace.fasterxml.jackson.core.JsonProcessingException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
{code}
Had to go
{code}
cp ./share/hadoop/yarn/timelineservice/lib/htrace-core-3.1.0-incubating.jar
share/hadoop/common/lib
{code}
I don't get this as the dependencies are set up (its a 'compile' dep), so not
sure why it isn't ending up in hadoop tools lib. Again, clearly something
packaging related.
I'm not sure how to test this stuff except manually; we don't have any
integration tests of the actual packaged code (yet), though its probably
possible via some new test suite in the hadoop-dist packaging. Or something
downstream/nearby. I know all the hadoop -stack-vendors will have some tests
for this, but its testing their packaging, not the base ASF stuff...
> Support Windows Azure Storage - Blob file system in Hadoop
> ----------------------------------------------------------
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/azure
> Affects Versions: 3.2.0
> Reporter: Esfandiar Manii
> Assignee: Esfandiar Manii
> Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch,
> HADOOP-15407-003.patch, HADOOP-15407-004.patch,
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch,
> HADOOP-15407-patch-atop-patch-007.patch
>
>
> *{color:#212121}Description{color}*
> This JIRA adds a new file system implementation, ABFS, for running Big Data
> and Analytics workloads against Azure Storage. This is a complete rewrite of
> the previous WASB driver with a heavy focus on optimizing both performance
> and cost.
> {color:#212121} {color}
> *{color:#212121}High level design{color}*
> At a high level, the code here extends the FileSystem class to provide an
> implementation for accessing blobs in Azure Storage. The scheme abfs is used
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following
> URI scheme is used to address individual paths:
> {color:#212121} {color}
>
> {color:#212121}abfs[s]://<filesystem>@<account>.dfs.core.windows.net/<path>{color}
> {color:#212121} {color}
> {color:#212121}ABFS is intended as a replacement to WASB. WASB is not
> deprecated but is in pure maintenance mode and customers should upgrade to
> ABFS once it hits General Availability later in CY18.{color}
> {color:#212121}Benefits of ABFS include:{color}
> {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big
> Data and Analytics workloads by allowing higher limits on storage
> accounts{color}
> {color:#212121}· Removing any ramp up time with Storage backend
> partitioning; blocks are now automatically sharded across partitions in the
> Storage backend{color}
> {color:#212121} . This avoids the need for using
> temporary/intermediate files, increasing the cost (and framework complexity
> around committing jobs/tasks){color}
> {color:#212121}· Enabling much higher read and write throughput on
> single files (tens of Gbps by default){color}
> {color:#212121}· Still retaining all of the Azure Blob features
> customers are familiar with and expect, and gaining the benefits of future
> Blob features as well{color}
> {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the
> file system throughput and operations. Ambari metrics are not currently
> implemented for ABFS, but will be available soon.{color}
> {color:#212121} {color}
> *{color:#212121}Credits and history{color}*
> Credit for this work goes to (hope I don't forget anyone): Shane Mainali,
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant,
> and James Baker. {color}
> {color:#212121} {color}
> *Test*
> ABFS has gone through many test procedures including Hadoop file system
> contract tests, unit testing, functional testing, and manual testing. All the
> Junit tests provided with the driver are capable of running in both
> sequential/parallel fashion in order to reduce the testing time.
> {color:#212121}Besides unit tests, we have used ABFS as the default file
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a
> storage option. (HDFS is also used but not as default file system.) Various
> different customer and test workloads have been run against clusters with
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS,
> Spark Streaming and Spark SQL, and others have been run to do scenario,
> performance, and functional testing. Third parties and customers have also
> done various testing of ABFS.{color}
> {color:#212121}The current version reflects to the version of the code
> tested and used in our production environment.{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]