Jay, thanks for the clarification. I think I understand it a bit better now, and It all pretty much makes sense. See inline

Also, slightly OT, but what are Configurable and AbstractConfig#getConfiguredInstance for? I don't see anything using them.

On 2/6/14 11:46 AM, Jay Kreps wrote:
Hey David,

You raise a couple points, let me give the rationale and see if you are
convinced.

1. Why not reuse commons config? This is exactly the right question for
application programming. We could save ourselves a few hundred lines of
code we would maintain by just adding a library dependency, which is almost
always the right decision if you are building an application as the
application is itself the end result. But the issue here is that these
clients will be embedded in thousands of applications which will in turn
have other dependencies. So as a result any common libraries we use will
also be used by others and this inevitably leads to
dependency/incompatibility hell. As a result I really think for this kind
of helper code we should just do it ourselves. I think of this as kind of
our sacrifice for the users. :-)

Meanwhile since we are decoupling the client and the server and since the
server should always be a stand alone thing we should be able to be much
more liberal in our dependencies there.
Yea, good point. I'd like to think that reusing existing stuff is always the right thing to do, but I've been through dependency hell plenty to know that's not the reality.

2. The next question is why not use a commons-config-like approach where
the config object is a map/properties thing and we wrap it in a pojo that
provides getters that parse values. This is the approach we took in the
existing client and server code. The problem with this is that it is
impossible to determine the set of configs programmatically. The goal was
to have our config documentation automatically generated off the code
(including the type, default value, documentation, etc). This also has the
nice side effect that all the configs are validated up front, you don't
have to wait until someone calls the getter to check validity.
I can see where the declarative stuff is useful for auto-generating documentation, however that's another thing we'd have to build - that is of course if you haven't built it already ;)

Loading the config eagerly also breaks the ability for config to change at runtime, which can be appealing for changing things like buffer sizes, timeouts, etc. I've been using Archaius lately, and it's pretty awesome. But again, it's more geared towards application development (like myself).

3. This approach doesn't prohibit custom validation. All validation is done
with a Validator, so you can plug in any validation you like. This can be a
bit ugly for one-off code (annonymous inner class, yuck...but will get
better with java 8 and in scala is already good). But we can also just do
validation in line in the wiring class. I see the "responsibility" of
KafkaProducer or KafkaServer classes as turning configs into a constructed
assembled hierachy of objects so checking stuff there is totally legit.
Gotcha

4. If the pojo wrapper you describe is just an internal convenience for our
code (as Joel described), then I have no objection to that. But the
objection to a user facing pojo was what I described before...
Internal/user-facing POJO might be nice so we _do_ get compiler warnings when trying to get non-existent configs.


Make sense?

-Jay


On Thu, Feb 6, 2014 at 6:27 AM, David Arthur <mum...@gmail.com> wrote:

The declarative approach strikes me as a bit odd. Why not put the
precondition logic in a POJO wrapper? Also, why reinvent
commons-configuration? It's got a nice API and tons of people use it.

public class ProducerConfig extends org.apache.commons.configuration.
AbstractConfiguration {

   /**
     * Blah, short doc @{see #getRequiredAcks}
     */
   public static final String REQUIRED_ACKS_CONFIG = "
request.required.acks";

   /**
     * Blah blah, detailed docs
     */
   public int getRequiredAcks() {
     int acks = super.getInt( REQUIRED_ACKS_CONFIG, 1);
     if(acks < -1 || acks > Short.MAX_VALUE) {
       throw new ConfigurationError("Config precondition failed, " +
REQUIRED_ACKS_CONFIG + " must be between -1 and 32767");
     }
   }
}

After typing that all out, I can see how the declarative
config-configuration saves a bit of boilerplate, but it also limits you to
what preconditions you define. E.g., what if you wanted to validate a ZK
string in the POJO wrapper?

-David


On 2/4/14 12:34 PM, Jay Kreps wrote:

We touched on this a bit in previous discussions, but I wanted to draw out
the approach to config specifically as an item of discussion.

The new producer and consumer use a similar key-value config approach as
the existing scala clients but have different implementation code to help
define these configs. The plan is to use the same approach on the server,
once the new clients are complete; so if we agree on this approach it will
be the new default across the board.

Let me split this into two parts. First I will try to motivate the use of
key-value pairs as a configuration api. Then let me discuss the mechanics
of specifying and parsing these. If we agree on the public api then the
public api then the implementation details are interesting as this will be
shared across producer, consumer, and broker and potentially some tools;
but if we disagree about the api then there is no point in discussing the
implementation.

Let me explain the rationale for this. In a sense a key-value map of
configs is the worst possible API to the programmer using the clients. Let
me contrast the pros and cons versus a POJO and motivate why I think it is
still superior overall.

Pro: An application can externalize the configuration of its kafka clients
into its own configuration. Whatever config management system the client
application is using will likely support key-value pairs, so the client
should be able to directly pull whatever configurations are present and
use
them in its client. This means that any configuration the client supports
can be added to any application at runtime. With the pojo approach the
client application has to expose each pojo getter as some config
parameter.
The result of many applications doing this is that the config is different
for each and it is very hard to have a standard client config shared
across. Moving config into config files allows the usual tooling (version
control, review, audit, config deployments separate from code pushes,
etc.).

Pro: Backwards and forwards compatibility. Provided we stick to our java
api many internals can evolve and expose new configs. The application can
support both the new and old client by just specifying a config that will
be unused in the older version (and of course the reverse--we can remove
obsolete configs).

Pro: We can use a similar mechanism for both the client and the server.
Since most people run the server as a stand-alone process it needs a
config
file.

Pro: Systems like Samza that need to ship configs across the network can
easily do so as configs have a natural serialized form. This can be done
with pojos using java serialization but it is ugly and has bizare failure
cases.

Con: The IDE gives nice auto-completion for pojos.

Con: There are some advantages to javadoc as a documentation mechanism for
java people.

Basically to me this is about operability versus niceness of api and I
think operability is more important.

Let me now give some details of the config support classes in
kafka.common.config and how they are intended to be used.

The goal of this code is the following:
1. Make specifying configs, their expected type (string, numbers, lists,
etc) simple and declarative
2. Allow for validating simple checks (numeric range checks, etc)
3. Make the config "self-documenting". I.e. we should be able to write
code
that generates the configuration documentation off the config def.
4. Specify default values.
5. Track which configs actually get used.
6. Make it easy to get config values.

There are two classes there: ConfigDef and AbstractConfig. ConfigDef
defines the specification of the accepted configurations and
AbstractConfig
is a helper class for implementing the configuration class. The difference
is kind of like the difference between a "class" and an "object":
ConfigDef
is for specifying the configurations that are accepted, AbstractConfig is
the base class for an instance of these configs.

You can see this in action here:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_
plain;f=clients/src/main/java/kafka/clients/producer/
ProducerConfig.java;hb=HEAD

(Ignore the static config names in there for now...I'm not actually sure
that is the best approach).

So the way this works is that the config specification is defined as:

          config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
"documentation")

                                  .define("metadata.timeout.ms",
Type.LONG,
60 * 1000, atLeast(0), "documentation")
                                  .define("max.partition.size", Type.INT,
16384, atLeast(0), "documentation")


This is used in a ProducerConfig class which extends AbstractConfig to get
access to some helper methods as well as the logic for tracking which
configs get accessed.

Currently I have included static String variables for each of the config
names in that class. However I actually think that is not very helpful as
the javadoc for them doesn't give the constant value and requires
duplicating the documentation. To understand this point look at the
javadoc
and note that the doc on the string is not the same as what we define in
the ConfigDef. We could just have the javadoc for the config string be the
source of truth but it is actually pretty inconvient for that as it
doesn't
show you the value of the constant, just the variable name (unless you
discover how to unhide it). That is fine for the clients, but for the
server would be very weird especially for non-java people. We could
attempt
to duplicate documentation between the javadoc and the ConfigDef but given
our struggle to get well-documented config in a single place this seems
unwise.

So I recommend we have a single source for documentation of these and that
that source be the website documentation on configuration that covers
clients and server and that that be generated off the config defs. The
javadoc on KafkaProducer will link to this table so it should be quite
convenient to discover. This makes things a little more typo prone, but
that should be easily caught by the key detection. This will also make it
possible for us to retire configs in the future without causing compile
failures and add configs without having use of them break backwards
compatibility. This is useful during upgrades where you want to be
compatible with the old and new version so you can roll forwards and
backwards.

-Jay



Reply via email to