[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085477#comment-15085477 ]
Steve Loughran commented on SPARK-7481: --------------------------------------- Josh, there is a 2.6 profile —but all it currently does is bump up the dependencies of other things (jets3t, curator, etc). It doesn't pull in hadoop-aws, which is where the s3a, s3n stuff lives, or the amazon JAR which is needed for s3a to work (the fact that s3n moved to the new JAR was something somebody else did; I've have probably vetoed it if I'd noticed). the amazon JAR in Hadoop 2.6, `aws-java-sdk` is huge, and not something you'd want in the spark assembly. Hadoop 2.7+ has switched to to the leaner aws-java-sdk-s3; HADOOP-12269 has shown how that's been a bit brittle over versions. Pulling in all the amazon SDK bits into the assembly jar is something that could be done if targeting Hadoop 2.7+, but you'd need care to make sure that the exact amazon lib that Hadoop was built against is used. It'd be easier if # `bin\spark-class` (and transitively, things like the yarn launcher) grabbed *.jar from the Spark lib dir, so all people would need to do is drop in the appropriate aws JAR (or for azure, the MSFT azure JAR) # the 2.6 profile added hadoop-aws to the dependencies of the spark assembly (and hadoop-openstack) # a 2.7 profile added hadoop-azure that is: the hadoop code is used (all fairly thin), but the third party JARs are left out This would mean the assembly had all the Hadoop stuff, and all people needed to do was drop in the external jirs to the lib directory What do you think? > Add Hadoop 2.6+ profile to pull in object store FS accessors > ------------------------------------------------------------ > > Key: SPARK-7481 > URL: https://issues.apache.org/jira/browse/SPARK-7481 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 1.3.1 > Reporter: Steve Loughran > > To keep the s3n classpath right, to add s3a, swift & azure, the dependencies > of spark in a 2.6+ profile need to add the relevant object store packages > (hadoop-aws, hadoop-openstack, hadoop-azure) > this adds more stuff to the client bundle, but will mean a single spark > package can talk to all of the stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org