Dan Burkert commented on KUDU-1948:
I want to chime in here since I've traditionally played the devil's advocate
position that Kudu should _not_ have client configs. The 'guiding principal'
behind this argument is that libraries should not include a configuration
framework*. A configuration framework should purely be the concern of the
This argument is muddied somewhat by a combination of factors:
* The split between application and library isn't always clear. In Kudu's
case it's clear that the master and tserver processes are applications, while
the clients are libraries. The provided CLI tools are less clear, but in this
context I consider them applications (and indeed, they already ship with the
gflags configuration framework).
* The JVM and associated ecosystem has historically done a poor job at
distinguishing between applications and libraries. JAR files are meant to
serve both purposes, and as a result they do each badly***.
* Hadoop and associated ecosystem has historically done a poor job at
distinguishing between applications and libraries. The Hadoop Configuration
class/framework is used pervasively which leads to a host of issues.
> If supported by kudu-spark this would help reduce the friction to
> reading/writing Kudu data – just put in your table name and go!
Client configs are often used as a poor substitute for service discovery**.
Although not widely recognized as such, Hadoop _already has_ a service
discovery component: the Hive MetaStore****. It's on the Kudu road map to
integrate with the HMS, at which point Spark and other users can discover Kudu
tables along with the necessary information to connect (eg master addresses)
there. Note that the same guiding principal applies to service discovery: only
applications should be using them; libraries should never, for instance, have a
built-in HMS or Zookeeper or etcd connection.
* In this context, 'configuration framework' means something that picks up
config properties from well known locations on disk, or from the environment,
or from a database/zookeeper, or more generally anything not passed explicitly
to the library through an API. Not included under 'configuration framework' is
APIs for passing configuration into the library, including builders and
un-typed map style APIs.
** They are a poor substitute because they are not centrally managed, so
changes must be pushed separately to every client configuration copy. Vendors
have papered over this by making it easy with the equivalent of a distributed
scp, but the fundamental crappiness of the solution remains.
*** This is why, I'm convinced, patterns like DI flourish in Java. They are
over-engineered band-aids which address the symptoms of failing to keep the
lines between library and application clean.
**** I'm fully aware of how absurd it is to suggest adding _yet another_
responsibility to the HMS at which it will inevitably be pretty poor at, but
the fact of the matter is that the HMS already serves this role. In my opinion
it's better to acknowledge that the HMS serves this role and work towards
improving its suitability than to indirectly paper over the issue with
> Client-side configuration of cluster details
> Key: KUDU-1948
> URL: https://issues.apache.org/jira/browse/KUDU-1948
> Project: Kudu
> Issue Type: New Feature
> Components: client, security
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Assignee: Grant Henke
> Priority: Major
> In the beginning, Kudu clients were configured with only the address of the
> single Kudu master. This was nice and simple, and there was no need for a
> client "configuration file".
> Then, we added multi-masters, and the client API had to take a list of master
> addresses. This wasn't awful, but started to be a bit aggravating when trying
> to use tools on a multi-master cluster (who wants to type out three long
> hostnames in a 'ksck' command line every time?).
> Now with security, we have a couple more bits of configuration for the
> client. Namely:
> - "require SSL" and "require authentication" booleans -- necessary to prevent
> MITM downgrade attacks
> - custom Kerberos principal -- if the server wants to use a principal other
> than 'kudu/<HOST>@REALM' then the client needs to know to expect it and fetch
> the appropriate service ticket. (Note this isn't yet supported but would like
> to be!)
> In the future, there are other items that might be best specified as part of
> a client configuration as well (e.g. CA cert for BYO PKI, wire compression
> options, etc).
> For the above use cases it would be nicer to allow the various options to be
> specified in a configuration file rather than adding specific APIs for all
This message was sent by Atlassian JIRA