[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

Steve Loughran (JIRA) Mon, 24 Apr 2017 11:12:21 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981602#comment-15981602
 ]


Steve Loughran commented on SPARK-7481:
---------------------------------------

One thing I want to emphasise here is: I have no loyalty to my code. I just 
want packings of and applications pulling in via maven/SBT to be able to have a 
consistent set of artifacts needed to successfully interact with object stores 
as a source of data. Most of the stuff related to spark/object store 
integration I can do  elsewhere, such as in Apache Bahir (integrating) and on 
github. It's just that classpath setup which you can't really do downstream as 
it depends on getting the combination of things like spark, hadoop, aws-sdk, 
jackson, all 100% consistent.

That's all I care about. And I don't care if someone else does it, as long as 
the patch works for current and future versions of hadoop/aws-SDK. If someone 
else does it, I'll gladly test that stuff downstream.

But Spark does need that integration. It had some in the past, when s3n was 
implemented in hadoop-common, but that's been gone since things were moved in 
Hadoop 2.6. I personally think it should go back in, as, implicitly, so does 
everyone whose downstream spark-based product includes a set of the cloud 
storage clients and JARs 100% in sync with the rest of their product's 
artifacts. 

So: does anyone have any alternative designs? The easiest would be to add it to 
spark-core itself, but that's got consequences if people ship anything built on 
the shaded-AWS JAR (the one which fixes its jackson inconsistencies 
internally), as it adds tens of MB to everything pulling in spark-core. A 
separate module is the way to manage this. Which is pretty much all the final 
version of the patch is.

> Add spark-hadoop-cloud module to pull in object store support
> -------------------------------------------------------------
>
>                 Key: SPARK-7481
>                 URL: https://issues.apache.org/jira/browse/SPARK-7481
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 2.1.0
>            Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-7481) Add spark-hadoop-cloud module to pull in object store support

Reply via email to