[
https://issues.apache.org/jira/browse/HADOOP-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878560#comment-16878560
]
Jose Luis Pedrosa commented on HADOOP-16408:
--------------------------------------------
1) This is the distribution binaries of hadoop 3.2.0
([https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz)]
2) All the azure (as well as the cloud providers) are only in
`/hadoop-3.2.0/share/hadoop/tools/lib/` and that path is not exported with *.
So basically I have to construct the path for spark like this (not sure if
spark tolerates the *, it's handled at code level):
{code:java}
HADOOP_TOOLS=/opt/hadoop/share/hadoop/tools/lib
EXTRA_JARS=${HADOOP_TOOLS}/hadoop-azure-3.2.0.jar:${HADOOP_TOOLS}/azure-storage-7.0.0.jar:${HADOOP_TOOLS}/azure-keyvault-core-1.0.0.jar:${HADOOP_TOOLS}/wildfly-openssl-1.0.4.Final.jar
SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath):${EXTRA_JARS}
{code}
the abfs file system is defined in the jars not being exported:
1) Without even going to the code, we can see that the class that implements
the FS ABFS is: org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem just by
grepping in the default core files
{noformat}
grep -b2 -a2 'org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem'
./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
58641-<property>
58652- <name>fs.abfs.impl</name>
58680: <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value>
58747- <description>The implementation class of the Azure Blob
Filesystem</description>
58830-</property>
{noformat}
2) if we look for that .class file by grepping too:
{noformat}
unzip -l share/hadoop/tools/lib/hadoop-azure-3.2.0.jar | grep
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem
1295 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem$1.class
1166 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem$2.class
1323 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem$FileSystemOperation.class
29560 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.class
2304 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore$VersionedFileStatus.class
35744 01-08-2019 07:41
org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.class
{noformat}
3) As we can see in the class path item list, neither
{noformat}
hadoop/tools/lib/hadoop-azure-3.2.0.jar {noformat}
or
{noformat}
share/hadoop/tools/lib/*{noformat}
is exported.
If you'd like still me to run that tool, could you please give me a bit more
details on what do you want me to run?
Thanks!
> `hadoop classpath` does not export azure jars, only data lake
> -------------------------------------------------------------
>
> Key: HADOOP-16408
> URL: https://issues.apache.org/jira/browse/HADOOP-16408
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 3.2.0
> Reporter: Jose Luis Pedrosa
> Priority: Minor
>
> `hadoop classpath` does not export azure jars, only azure data lake jars
> This means that when running spark applications, it will not be able to use
> abfs urls (among other things).
> When running:
> {code}
> hadoop-3.2.0/bin/hadoop classpath
> {code}
> It will output all the jars for *hadoop-azure-datalake*, but will not output
> the jars for *hadoop-azure,* which are the ones that actually contain the FS
> for the new ABFS.
> {noformat}
> /hadoop-3.2.0/etc/hadoop:
> /hadoop-3.2.0/share/hadoop/common/lib/*:
> /hadoop-3.2.0/share/hadoop/common/*:
> /hadoop-3.2.0/share/hadoop/tools/lib/aliyun-sdk-oss-2.8.3.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/jdom-1.1.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/hadoop-aliyun-3.2.0.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/hadoop-aws-3.2.0.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/azure-data-lake-store-sdk-2.2.9.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/hadoop-azure-datalake-3.2.0.jar:
> /hadoop-3.2.0/share/hadoop/hdfs:
> /hadoop-3.2.0/share/hadoop/hdfs/lib/*:
> /hadoop-3.2.0/share/hadoop/hdfs/*:
> /hadoop-3.2.0/share/hadoop/tools/lib/kafka-clients-0.8.2.1.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/lz4-1.2.0.jar:
> /hadoop-3.2.0/share/hadoop/tools/lib/hadoop-kafka-3.2.0.jar:
> /hadoop-3.2.0/share/hadoop/mapreduce/lib/*:
> /hadoop-3.2.0/share/hadoop/mapreduce/*:
> /hadoop-3.2.0/share/hadoop/tools/lib/hadoop-openstack-3.2.0.jar:
> /hadoop-3.2.0/share/hadoop/yarn:
> /hadoop-3.2.0/share/hadoop/yarn/lib/*:
> /hadoop-3.2.0/share/hadoop/yarn/*
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]