Re: [Discuss] Datasource v2 support for Kerberos

2018-09-25 Thread tigerquoll
To give some Kerberos specific examples, The spark-submit args: -–conf spark.yarn.keytab=path_to_keytab -–conf spark.yarn.principal=princi...@realm.com are currently not passed through to the data sources. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-24 Thread tigerquoll
I like the shared namespace option better then the white listing option for any newly defined configuration information. All of the Kerberos options already exist in their own legacy locations though - changing their location could break a lot of systems. Perhaps we can use the shared

Re: [Discuss] Datasource v2 support for Kerberos

2018-09-23 Thread tigerquoll
I believe the current spark config system is unfortunate in the way it has grown - you have no way of telling which sub-systems uses which configuration options without direct and detailed reading of the code. Isolating config items for datasources into a separate namespaces (rather then using a

Re: [Discuss] Datasource v2 support for manipulating partitions

2018-09-17 Thread tigerquoll
Hi Jayesh, I get where you are coming from - partitions are just an implementation optimisation that we really shouldn’t be bothering the end user with. Unfortunately that view is like saying RPC is like a procedure call, and details of the network transport should be hidden from the end user.

[Discuss] Datasource v2 support for Kerberos

2018-09-16 Thread tigerquoll
The current V2 Datasource API provides support for querying a portion of the SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API. This was designed with the assumption that all configuration information for v2 data sources should be separate from each other.

[Discuss] Datasource v2 support for manipulating partitions

2018-09-16 Thread tigerquoll
I've been following the development of the new data source abstraction with keen interest. One of the issues that has occurred to me as I sat down and planned how I would implement a data source is how I would support manipulating partitions. My reading of the current prototype is that Data