Steve Loughran created SPARK-42537:
--------------------------------------

             Summary: Remove obsolete/superfluous imports in spark-hadoop-cloud 
module
                 Key: SPARK-42537
                 URL: https://issues.apache.org/jira/browse/SPARK-42537
             Project: Spark
          Issue Type: Improvement
          Components: Build
    Affects Versions: 3.4.0
            Reporter: Steve Loughran


The explicit imports into hadoop-cloud are obsolete

* the hadoop-cloud-storage pom is a cut down export of the bindings to the 
various cloud stores in their hadoop-* modules
* it's been shipping since hadoop 2.10
* its grown to include cos and allyun support
* fairly well tested
* actually cuts removed support (hadoop-openstack) when withdrawn. Hadoop 3.3.5 
has done this, leaving a stub jar there just to avoid breaking downstream 
builds like spark's current setup.

hadoop-cloud-storage *should* be all that's needed.

I know that the spark hadoop-2 profile still references the (long unsupported 
2.7.x), but if you are using those releases then really you aren't going to 
talk to cloud infra
* no abfs connector
* s3n connector which won't authenticate with any of the aws regions launched 
in the past 5-8 years
* gcs connector won't work (its java11+; hadoop 3.2.x is minimum for java11 
clients)
* none of the new chinese cloud services
* s3a connector very outdated.
* s3a connector using unshaded aws client which is unlikely to work with 
versions of jackson, httpclient written in the last 5 years, has trouble on 
java8 etc.

Proposed
* hadoop-2 profile to be the minimal hadoop-aws and hadoop-azure dependencies 
in the code today. cutting to the empty set would be better, but a bit more 
radical
* hadoop-3 profile to pull in hadoop-cloud-storage (excluding aws sdk as 
today), *and nothing else*

This will simplify everyone's life as there are fewer dependencies to 
reconcile. 

see also SPARK-39969 proposing making the hadoop-aws versions of the 
aws-sdk-bundle the normative one, as it is now newer than the spark-kinesis 
import and more broadly tested




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to