[
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389707#comment-15389707
]
Steve Loughran commented on SPARK-7481:
---------------------------------------
Sad but true.
* The PR I've put up adds the hadoop-aws and dependent JARs for Hadoop 2.7 and
2.8 to the -assembly JAR, and in spark-2, to
* I'm avoiding 2.6 as S3A is essentially not production ready at that point.
* For 2.7, HADOOP-12636 needs to be in there to handle the case of: hadoop-aws
on classpath, amazon SDK *not on classpath*. It catches the classloader
exception and downgrades. Without that, while the packaging works, if anyone
left out the amazon JAR they'd see a stack trace. This will be in Hadoop 2.7.3
—until then this patch isn't something I'd advocate adding, especially not to a
1.6 branch.
Cloudera watchers: make sure you've cherry picked HADOOP-12636...
Here's the listing of spark-2 + SPARK-7481 built against yesterday's Hadoop
branch-2.8. The new ones are hadoop-aws, hadoop-azure, azure-storage-2.2.0.jar,
,aws-java-sdk-kms-1.10.6.jar, aws-java-sdk-s3-1.10.6.jar and
jackson-dataformat-cbor-2.6.5.jar. AWS needs that; it's explicitly declared and
forced to be the same version as the rest of spark Jackson.
{code}
$ ls -1 dist/jars/
RoaringBitmap-0.5.11.jar
activation-1.1.1.jar
antlr4-runtime-4.5.3.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.4.0-b34.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
arpack_combined_all-0.1.jar
avro-1.7.7.jar
avro-ipc-1.7.7.jar
avro-mapred-1.7.7-hadoop2.jar
aws-java-sdk-core-1.10.6.jar
aws-java-sdk-kms-1.10.6.jar
aws-java-sdk-s3-1.10.6.jar
azure-storage-2.2.0.jar
base64-2.3.8.jar
bcprov-jdk15on-1.51.jar
breeze-macros_2.11-0.11.2.jar
breeze_2.11-0.11.2.jar
chill-java-0.8.0.jar
chill_2.11-0.8.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.2.jar
commons-compiler-2.7.8.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.3.2.jar
commons-math3-3.4.1.jar
commons-net-2.2.jar
compress-lzf-1.0.3.jar
core-1.1.2.jar
curator-client-2.6.0.jar
curator-framework-2.6.0.jar
curator-recipes-2.6.0.jar
gson-2.2.4.jar
guava-14.0.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.8.0-SNAPSHOT.jar
hadoop-auth-2.8.0-SNAPSHOT.jar
hadoop-aws-2.8.0-SNAPSHOT.jar
hadoop-azure-2.8.0-SNAPSHOT.jar
hadoop-client-2.8.0-SNAPSHOT.jar
hadoop-common-2.8.0-SNAPSHOT.jar
hadoop-hdfs-client-2.8.0-SNAPSHOT.jar
hadoop-mapreduce-client-app-2.8.0-SNAPSHOT.jar
hadoop-mapreduce-client-common-2.8.0-SNAPSHOT.jar
hadoop-mapreduce-client-core-2.8.0-SNAPSHOT.jar
hadoop-mapreduce-client-jobclient-2.8.0-SNAPSHOT.jar
hadoop-mapreduce-client-shuffle-2.8.0-SNAPSHOT.jar
hadoop-openstack-2.8.0-SNAPSHOT.jar
hadoop-yarn-api-2.8.0-SNAPSHOT.jar
hadoop-yarn-client-2.8.0-SNAPSHOT.jar
hadoop-yarn-common-2.8.0-SNAPSHOT.jar
hadoop-yarn-server-common-2.8.0-SNAPSHOT.jar
hadoop-yarn-server-web-proxy-2.8.0-SNAPSHOT.jar
hk2-api-2.4.0-b34.jar
hk2-locator-2.4.0-b34.jar
hk2-utils-2.4.0-b34.jar
htrace-core4-4.0.1-incubating.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
ivy-2.4.0.jar
jackson-annotations-2.6.5.jar
jackson-core-2.6.5.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.6.5.jar
jackson-dataformat-cbor-2.6.5.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-paranamer-2.6.5.jar
jackson-module-scala_2.11-2.6.5.jar
jackson-xc-1.9.13.jar
janino-2.7.8.jar
java-xmlbuilder-1.0.jar
javassist-3.18.1-GA.jar
javax.annotation-api-1.2.jar
javax.inject-1.jar
javax.inject-2.4.0-b34.jar
javax.servlet-api-3.1.0.jar
javax.ws.rs-api-2.0.1.jar
jaxb-api-2.2.2.jar
jcip-annotations-1.0.jar
jcl-over-slf4j-1.7.16.jar
jersey-client-2.22.2.jar
jersey-common-2.22.2.jar
jersey-container-servlet-2.22.2.jar
jersey-container-servlet-core-2.22.2.jar
jersey-guava-2.22.2.jar
jersey-media-jaxb-2.22.2.jar
jersey-server-2.22.2.jar
jets3t-0.9.3.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
joda-time-2.9.3.jar
json-smart-1.1.1.jar
json4s-ast_2.11-3.2.11.jar
json4s-core_2.11-3.2.11.jar
json4s-jackson_2.11-3.2.11.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
jtransforms-2.4.0.jar
jul-to-slf4j-1.7.16.jar
kryo-shaded-3.0.3.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
lz4-1.3.0.jar
mail-1.4.7.jar
mesos-0.21.1-shaded-protobuf.jar
metrics-core-3.1.2.jar
metrics-graphite-3.1.2.jar
metrics-json-3.1.2.jar
metrics-jvm-3.1.2.jar
minlog-1.3.0.jar
mx4j-3.0.2.jar
netty-3.8.0.Final.jar
netty-all-4.0.29.Final.jar
nimbus-jose-jwt-3.9.jar
objenesis-2.1.jar
okhttp-2.4.0.jar
okio-1.4.0.jar
opencsv-2.3.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.6.jar
parquet-column-1.7.0.jar
parquet-common-1.7.0.jar
parquet-encoding-1.7.0.jar
parquet-format-2.3.0-incubating.jar
parquet-generator-1.7.0.jar
parquet-hadoop-1.7.0.jar
parquet-jackson-1.7.0.jar
pmml-model-1.2.15.jar
pmml-schema-1.2.15.jar
protobuf-java-2.5.0.jar
py4j-0.10.1.jar
pyrolite-4.9.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
scala-reflect-2.11.8.jar
scala-xml_2.11-1.0.2.jar
scalap-2.11.8.jar
slf4j-api-1.7.16.jar
slf4j-log4j12-1.7.16.jar
snappy-java-1.1.2.4.jar
spark-catalyst_2.11-2.0.0.jar
spark-cloud_2.11-2.0.0.jar
spark-core_2.11-2.0.0.jar
spark-graphx_2.11-2.0.0.jar
spark-launcher_2.11-2.0.0.jar
spark-mllib-local_2.11-2.0.0.jar
spark-mllib_2.11-2.0.0.jar
spark-network-common_2.11-2.0.0.jar
spark-network-shuffle_2.11-2.0.0.jar
spark-repl_2.11-2.0.0.jar
spark-sketch_2.11-2.0.0.jar
spark-sql_2.11-2.0.0.jar
spark-streaming_2.11-2.0.0.jar
spark-tags_2.11-2.0.0.jar
spark-unsafe_2.11-2.0.0.jar
spark-yarn_2.11-2.0.0.jar
spire-macros_2.11-0.7.4.jar
spire_2.11-0.7.4.jar
stax-api-1.0-2.jar
stream-2.7.0.jar
univocity-parsers-2.1.1.jar
validation-api-1.1.0.Final.jar
xbean-asm5-shaded-4.4.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.jar
{code}
[~nchammas]: if you get a chance, do try applying the PR to spark and build
with Hadoop 2.7.2. Provided you have the right JARs everything will work —and
if it doesn't, I need to know. To get the speedup you have to build Hadoop 2.8
and everything in Hadoop 11694 that's been committed.
> Add spark-cloud module to pull in aws+azure object store FS accessors; test
> integration
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
> Issue Type: Improvement
> Components: Build
> Affects Versions: 1.3.1
> Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies
> of spark in a 2.6+ profile need to add the relevant object store packages
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark
> package can talk to all of the stores.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]