The current V2 Datasource API provides support for querying a portion of the
SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API. 
This was designed with the assumption that all configuration information for
v2 data sources should be separate from each other.

Unfortunately, there are some cross-cutting concerns such as authentication
that touch multiple data sources - this means that common configuration
items need to be shared amongst multiple data sources.
In particular, Kerberos setup can use the following configuration items:

* userPrincipal, 
* userKeytabPath
* krb5ConfPath
* kerberos debugging flags
* JAAS config
* ZKServerPrincipal ??

So potential solutions I can think of to pass this information to various
data sources are:

* Pass the entire SparkContext object to data sources (not likely)
* Pass the entire SparkConfig Map object to data sources
* Pass all required configuration via environment variables
* Extend SessionConfigSupport to support passing specific white-listed
configuration values
* Add a specific data source v2 API "SupportsKerberos" so that a data source
can indicate that it supports Kerberos and also provide the means to pass
needed configuration info.
* Expand out all Kerberos configuration items to be in each data source
config namespace that needs it.

If the data source requires TLS support then we also need to support passing
all the  configuration values under  "spark.ssl.*"

What do people think?  Placeholder Issue has been added at SPARK-25329.

Sent from:

To unsubscribe e-mail:

Reply via email to