[jira] [Comment Edited] (KUDU-1948) Client-side configuration of cluster details

Dan Burkert (JIRA) Mon, 12 Feb 2018 14:32:39 -0800

    [ 
https://issues.apache.org/jira/browse/KUDU-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361544#comment-16361544
 ]


Dan Burkert edited comment on KUDU-1948 at 2/12/18 10:31 PM:
-------------------------------------------------------------

I want to chime in here since I've traditionally played the devil's advocate 
position that Kudu should _not_ have client configs. The 'guiding principal' 
behind this argument is that libraries should not include a configuration 
framework[1].  A configuration framework should purely be the concern of the 
end-user application.

This argument is muddied somewhat by a combination of factors:
 * The split between application and library isn't always clear.  In Kudu's 
case it's clear that the master and tserver processes are applications, while 
the clients are libraries.  The provided CLI tools are less clear, but in this 
context I consider them applications (and indeed, they already ship with the 
gflags configuration framework).
 * The JVM and associated ecosystem has historically done a poor job at 
distinguishing between applications and libraries.  JAR files are meant to 
serve both purposes, and as a result they do each badly[3].
 * Hadoop and associated ecosystem has historically done a poor job at 
distinguishing between applications and libraries.  The Hadoop Configuration 
class/framework is used pervasively which leads to a host of issues.

{quote}If supported by kudu-spark this would help reduce the friction to 
reading/writing Kudu data – just put in your table name and go!
{quote}

Client configs are often used as a poor substitute for service discovery[2].  
Although not widely recognized as such, Hadoop _already has_ a service 
discovery component: the Hive MetaStore[4].  It's on the Kudu road map to 
integrate with the HMS, at which point Spark and other users can discover Kudu 
tables along with the necessary information to connect (eg master addresses) 
there.  Note that the same guiding principal applies to service discovery: only 
applications should be using them; libraries should never, for instance, have a 
built-in HMS or Zookeeper or etcd connection.

 [1] In this context, 'configuration framework' means something that picks up 
config properties from well known locations on disk, or from the environment, 
or from a database/zookeeper, or more generally anything not passed explicitly 
to the library through an API.  Not included under 'configuration framework' is 
APIs for passing configuration into the library, including builders and 
un-typed map style APIs.

[2] They are a poor substitute because they are not centrally managed, so 
changes must be pushed separately to every client configuration copy.  Vendors 
have papered over this by making it easy with the equivalent of a distributed 
scp, but the fundamental crappiness of the solution remains.

[3] This is why, I'm convinced, patterns like DI flourish in Java.  They are 
over-engineered band-aids which address the symptoms of failing to keep the 
lines between library and application clean.

[4] I'm fully aware of how absurd it is to suggest adding _yet another_ 
responsibility to the HMS at which it will inevitably be pretty poor at, but 
the fact of the matter is that the HMS already serves this role.  In my opinion 
it's better to acknowledge that the HMS serves this role and work towards 
improving its suitability than to indirectly paper over the issue with 
client-side configs.


was (Author: danburkert):
I want to chime in here since I've traditionally played the devil's advocate 
position that Kudu should _not_ have client configs. The 'guiding principal' 
behind this argument is that libraries should not include a configuration 
framework{{*}}.  A configuration framework should purely be the concern of the 
end-user application.

This argument is muddied somewhat by a combination of factors:
 * The split between application and library isn't always clear.  In Kudu's 
case it's clear that the master and tserver processes are applications, while 
the clients are libraries.  The provided CLI tools are less clear, but in this 
context I consider them applications (and indeed, they already ship with the 
gflags configuration framework).
 * The JVM and associated ecosystem has historically done a poor job at 
distinguishing between applications and libraries.  JAR files are meant to 
serve both purposes, and as a result they do each badly{{***}}.
 * Hadoop and associated ecosystem has historically done a poor job at 
distinguishing between applications and libraries.  The Hadoop Configuration 
class/framework is used pervasively which leads to a host of issues.

{quote}If supported by kudu-spark this would help reduce the friction to 
reading/writing Kudu data – just put in your table name and go!
{quote}
Client configs are often used as a poor substitute for service discovery{{**}}. 
 Although not widely recognized as such, Hadoop _a__lready has_ a service 
discovery component: the Hive MetaStore{{}}{{****}}.  It's on the Kudu road map 
to integrate with the HMS, at which point Spark and other users can discover 
Kudu tables along with the necessary information to connect (eg master 
addresses) there.  Note that the same guiding principal applies to service 
discovery: only applications should be using them; libraries should never, for 
instance, have a built-in HMS or Zookeeper or etcd connection.

 {{*}} In this context, 'configuration framework' means something that picks up 
config properties from well known locations on disk, or from the environment, 
or from a database/zookeeper, or more generally anything not passed explicitly 
to the library through an API.  Not included under 'configuration framework' is 
APIs for passing configuration into the library, including builders and 
un-typed map style APIs.

{{**}} They are a poor substitute because they are not centrally managed, so 
changes must be pushed separately to every client configuration copy.  Vendors 
have papered over this by making it easy with the equivalent of a distributed 
scp, but the fundamental crappiness of the solution remains.

{{***}} This is why, I'm convinced, patterns like DI flourish in Java.  They 
are over-engineered band-aids which address the symptoms of failing to keep the 
lines between library and application clean.

{{****}} I'm fully aware of how absurd it is to suggest adding _yet another_ 
responsibility to the HMS at which it will inevitably be pretty poor at, but 
the fact of the matter is that the HMS already serves this role.  In my opinion 
it's better to acknowledge that the HMS serves this role and work towards 
improving its suitability than to indirectly paper over the issue with 
client-side configs.

> Client-side configuration of cluster details
> --------------------------------------------
>
>                 Key: KUDU-1948
>                 URL: https://issues.apache.org/jira/browse/KUDU-1948
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, security
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Assignee: Grant Henke
>            Priority: Major
>
> In the beginning, Kudu clients were configured with only the address of the 
> single Kudu master. This was nice and simple, and there was no need for a 
> client "configuration file".
> Then, we added multi-masters, and the client API had to take a list of master 
> addresses. This wasn't awful, but started to be a bit aggravating when trying 
> to use tools on a multi-master cluster (who wants to type out three long 
> hostnames in a 'ksck' command line every time?).
> Now with security, we have a couple more bits of configuration for the 
> client. Namely:
> - "require SSL" and "require authentication" booleans -- necessary to prevent 
> MITM downgrade attacks
> - custom Kerberos principal -- if the server wants to use a principal other 
> than 'kudu/<HOST>@REALM' then the client needs to know to expect it and fetch 
> the appropriate service ticket. (Note this isn't yet supported but would like 
> to be!)
> In the future, there are other items that might be best specified as part of 
> a client configuration as well (e.g. CA cert for BYO PKI, wire compression 
> options, etc).
> For the above use cases it would be nicer to allow the various options to be 
> specified in a configuration file rather than adding specific APIs for all 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (KUDU-1948) Client-side configuration of cluster details

Reply via email to