[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085477#comment-15085477
 ] 

Steve Loughran commented on SPARK-7481:
---------------------------------------

Josh,

there is a 2.6 profile —but all it currently does is bump up the dependencies 
of other things (jets3t, curator, etc). It doesn't pull in hadoop-aws, which is 
where the s3a, s3n stuff lives, or the amazon JAR which is needed for s3a to 
work (the fact that s3n moved to the new JAR was something somebody else did; 
I've have probably vetoed it if I'd noticed). 

the amazon JAR in Hadoop 2.6, `aws-java-sdk` is huge, and not something you'd 
want in the spark assembly. Hadoop 2.7+ has switched to to the leaner 
aws-java-sdk-s3; HADOOP-12269 has shown how that's been a bit brittle over 
versions.

Pulling in all the amazon SDK bits into the assembly jar is something that 
could be done if targeting Hadoop 2.7+, but you'd need care to make sure that 
the exact amazon lib that Hadoop was built against is used.

It'd be easier if
# `bin\spark-class` (and transitively, things like the yarn launcher) grabbed 
*.jar from the Spark lib dir, so all people would need to do is drop in the 
appropriate aws JAR (or for azure, the MSFT azure JAR)
# the 2.6 profile added hadoop-aws to the dependencies of the spark assembly 
(and hadoop-openstack)
# a 2.7 profile added hadoop-azure

that is: the hadoop code is used (all fairly thin), but the third party JARs 
are left out

This would mean the assembly had all the Hadoop stuff, and all people needed to 
do was drop in the external jirs to the lib directory

What do you think?

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> ------------------------------------------------------------
>
>                 Key: SPARK-7481
>                 URL: https://issues.apache.org/jira/browse/SPARK-7481
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.3.1
>            Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to