[
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981602#comment-15981602
]
Steve Loughran commented on SPARK-7481:
---------------------------------------
One thing I want to emphasise here is: I have no loyalty to my code. I just
want packings of and applications pulling in via maven/SBT to be able to have a
consistent set of artifacts needed to successfully interact with object stores
as a source of data. Most of the stuff related to spark/object store
integration I can do elsewhere, such as in Apache Bahir (integrating) and on
github. It's just that classpath setup which you can't really do downstream as
it depends on getting the combination of things like spark, hadoop, aws-sdk,
jackson, all 100% consistent.
That's all I care about. And I don't care if someone else does it, as long as
the patch works for current and future versions of hadoop/aws-SDK. If someone
else does it, I'll gladly test that stuff downstream.
But Spark does need that integration. It had some in the past, when s3n was
implemented in hadoop-common, but that's been gone since things were moved in
Hadoop 2.6. I personally think it should go back in, as, implicitly, so does
everyone whose downstream spark-based product includes a set of the cloud
storage clients and JARs 100% in sync with the rest of their product's
artifacts.
So: does anyone have any alternative designs? The easiest would be to add it to
spark-core itself, but that's got consequences if people ship anything built on
the shaded-AWS JAR (the one which fixes its jackson inconsistencies
internally), as it adds tens of MB to everything pulling in spark-core. A
separate module is the way to manage this. Which is pretty much all the final
version of the patch is.
> Add spark-hadoop-cloud module to pull in object store support
> -------------------------------------------------------------
>
> Key: SPARK-7481
> URL: https://issues.apache.org/jira/browse/SPARK-7481
> Project: Spark
> Issue Type: Improvement
> Components: Build
> Affects Versions: 2.1.0
> Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies
> of spark in a 2.6+ profile need to add the relevant object store packages
> (hadoop-aws, hadoop-openstack, hadoop-azure)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]