To give some Kerberos specific examples, The spark-submit args:
-–conf spark.yarn.keytab=path_to_keytab -–conf
spark.yarn.principal=princi...@realm.com
are currently not passed through to the data sources.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
I like the shared namespace option better then the white listing option for
any newly defined configuration information.
All of the Kerberos options already exist in their own legacy locations
though - changing their location could break a lot of systems.
Perhaps we can use the shared
I believe the current spark config system is unfortunate in the way it has
grown - you have no way of telling which sub-systems uses which
configuration options without direct and detailed reading of the code.
Isolating config items for datasources into a separate namespaces (rather
then using a
Hi Jayesh,
I get where you are coming from - partitions are just an implementation
optimisation that we really shouldn’t be bothering the end user with.
Unfortunately that view is like saying RPC is like a procedure call, and
details of the network transport should be hidden from the end user.
The current V2 Datasource API provides support for querying a portion of the
SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API.
This was designed with the assumption that all configuration information for
v2 data sources should be separate from each other.
I've been following the development of the new data source abstraction with
keen interest. One of the issues that has occurred to me as I sat down and
planned how I would implement a data source is how I would support
manipulating partitions.
My reading of the current prototype is that Data