Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Colin McCabe Mon, 13 Feb 2017 11:52:17 -0800

On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> Hey Colin,
> 
> Thanks for the hard work on this. I know going back and forth on APIs is
> kind of frustrating but we're at the point where these things live long
> enough and are used by enough people that it is worth the pain. I'm sure
> it'll come down in the right place eventually. A couple things I've found
> helped in the past:
> 
>    1. The burden of evidence needs to fall on the complicator. i.e. if
>    person X thinks the api should be async they need to produce a set of
>    common use cases that require this. Otherwise you are perpetually
>    having to
>    think "we might need x". I think it is good to have a rule of "simple
>    until
>    proven insufficient".
>    2. Make sure we frame things for the intended audience. At this point
>    our apis get used by a very broad set of Java engineers. This is a
>    very
>    different audience from our developer mailing list. These people code
>    for a
>    living not necessarily as a passion, and may not understand details of
>    the
>    internals of our system or even basic things like multi-threaded
>    programming. I don't think this means we want to dumb things down, but
>    rather try really hard to make things truly simple when possible.
> 
> Okay here were a couple of comments:
> 
>    1. Conceptually what is a TopicContext? I think it means something
>    like
>    TopicAdmin? It is not literally context about Topics right? What is
>    the
>    relationship of Contexts to clients? Is there a threadsafety
>    difference?
>    Would be nice to not have to think about this, this is what I mean by
>    "conceptual weight". We introduce a new concept that is a bit nebulous
>    that
>    I have to figure out to use what could be a simple api. I'm sure
>    you've
>    been through this experience before where you have these various
>    objects
>    and you're trying to figure out what they represent (the connection to
>    the
>    server? the information to create a connection? a request session?).


The intention was to provide some grouping of methods, and also a place
to put request parameters which were often set to defaults rather than
being explicitly set.  If it seems complex, we can certainly get rid of
it.

>    2. We've tried to avoid the Impl naming convention. In general the
>    rule
>    has been if there is only going to be one implementation you don't
>    need an
>    interface. If there will be multiple, distinguish it from the others.
>    The
>    other clients follow this pattern: Producer, KafkaProducer,
>    MockProducer;
>    Consumer, KafkaConsumer, MockConsumer.

Good point.  Let's change the interface to KafkaAdminClient, and the
implementation to AdminClient.

>    3. We generally don't use setters or getters as a naming convention. I
>    personally think mutating the setting in place seems kind of like late
>    90s
>    Java style. I think it likely has thread-safety issues. i.e. even if
>    it is
>    volatile you may not get the value you just set if there is another
>    thread... I actually really liked what you described as your original
>    idea
>    of having a single parameter object like CreateTopicRequest that holds
>    all
>    these parameters and defaults. This lets you evolve the api with all
>    the
>    various combinations of arguments without overloading insanity. After
>    doing
>    literally tens of thousands of remote APIs at LinkedIn we eventually
>    converged on a rule, which is ultimately every remote api needs a
>    single
>    argument object you can add to over time and it must be batched. Which
>    brings me to my next point...

Just to clarify, volatiles were never a part of the proposal.  I think
that context objects or request objects should be used by a single
thread at a time.

I'm not opposed to request objects, but I think they raise all the same
questions as context objects.  Basically, the thread-safety issues need
to be spelled out and understood by the user, and the user needs more
lines of code to make a request.  And there will be people trying to do
things like re-use request objects when they should not, and so forth.

>    4. I agree batch apis are annoying but I suspect we'll end up adding
>    one. Doing 1000 requests for 1000 operations if creating or deleting
>    will
>    be bad, right? This won't be the common case, but when you do it it
>    will be
>    a deal-breaker problem. I don't think we should try to fix this one
>    behind
>    the scenes.
>    5. Are we going to do CompletableFuture (which requires java 8) or
>    normal Future? Normal Future is utterly useless for most things other
>    than
>    just calling wait. If we can evolve in place from Future to
>    CompletableFuture that is fantastic (we could do it for the producer
>    too!).
>    My belief was that this was binary incompatible but I actually don't
>    know
>    (obviously it's source compatible).

In my testing, replacing a return type with a subclass of that return
type did not break binary compatibility.  I haven't been able to find
chapter and verse on this from the Java implementers, though.

best,
Colin


> 
> -Jay
> 
> On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi all,
> >
> > I made some major revisions to the proposal on the wiki, so please check
> > it out.
> >
> > The new API is based on Ismael's suggestion of grouping related APIs.
> > There is only one layer of grouping.  I think that it's actually pretty
> > intuitive.  It's also based on the idea of using Futures, which several
> > people commented that they'd like to see.
> >
> > Here's a simple example:
> >
> >  > AdminClient client = new AdminClientImpl(myConfig);
> >  > try {
> >  >   client.topics().create("foo", 3, (short) 2, false).get();
> >  >   Collection<String> topicNames = client.topics().list(false).get();
> >  >   log.info("Found topics: {}", Utils.mkString(topicNames, ", "));
> >  >   Collection<Node> nodes = client.nodes().list().get();
> >  >   log.info("Found cluster nodes: {}", Utils.mkString(nodes, ", "));
> >  > } finally {
> >  >   client.close();
> >  > }
> >
> > The good thing is, there is no Try, no 'get' prefixes, no messing with
> > batch APIs.  If there is an error, then Future#get() throws an
> > ExecutionException which wraps the relevant exception in the standard
> > Java way.
> >
> > Here's a slightly less simple example:
> >
> > > AdminClient client = new AdminClientImpl(myConfig);
> > > try {
> > >   List<Future<Void>> futures = new LinkedList<>();
> > >   for (String topicName: myNewTopicNames) {
> > >     creations.add(client.topics().
> > >         setClientTimeout(30000).setCreationConfig(myTopicConfig).
> > >           create(topicName, 3, (short) 2, false));
> > >   }
> > >   Futures.waitForAll(futures);
> > > } finally {
> > >   client.close();
> > > }
> >
> > I went with Futures because I feel like ought to have some option for
> > doing async.  It's a style of programming that has become a lot more
> > popular with the rise of Node.js, Twisted python, etc. etc.  Also, as
> > Ismael commented, Java 8 CompletableFuture is going to make Java's
> > support for fluent async programming a lot stronger by allowing call
> > chaining and much more.
> >
> > If we are going to support async, the simplest thing is just to make
> > everything return a future and let people call get() if they want to run
> > synchronously.  Having a mix of async and sync APIs is just going to be
> > confusing and redundant.
> >
> > I think we should try to avoid creating single functions that start
> > multiple requests if we can.  It makes things much uglier.  It means
> > that you have to have some kind of request class that wraps up the
> > request the user is trying to create, so that you can handle an array of
> > those requests.  The return value has to be something like Map<Node,
> > Try<Value>> to represent which nodes failed and succeeded.  This is the
> > kind of stuff that, in my opinion, makes people scratch their heads.
> >
> > If we need to, we can still get some of the efficiency benefits of batch
> > APIs by waiting for a millisecond or two before sending out a topic
> > create() request to see if other create() requests arrive.  If so, we
> > can coalesce them.  It might be better to figure out if this is an
> > actual performance issue before implementing it, though.
> >
> > I think it would be good to get something out there, annotate it as
> > @Unstable, and get feedback from people building against trunk and using
> > it.  We have removed or changed @Unstable APIs in streams before, so I
> > don't think we should worry that it will get set in stone prematurely.
> > The AdminClient API should get much less developer use than anything in
> > streams, so changing an unstable API should be much easier.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote:
> > > Thanks for elaborating Jay. I totally agree that we have to be very
> > > careful
> > > in how we use our complexity budget. Easier said than done when people
> > > don't agree on what is complex and what is simple. :) For example, I
> > > think
> > > batch APIs are a significant source of complexity as you have to do a
> > > bunch
> > > of ceremony to group things before sending the request and error handling
> > > becomes more complex due to partial failures (things like `Try` or other
> > > mechanisms that serve a similar role are then needed).
> > >
> > > Maybe a way forward is to write API usage examples to help validate that
> > > the suggested API is indeed easy to use.
> > >
> > > Ismael
> > >
> > > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <j...@confluent.io> wrote:
> > >
> > > > Totally agree on CompletableFuture. Also agree with some of the rough
> > edges
> > > > on the Consumer.
> > > >
> > > > I don't have much of a leg to stand on with the splitting vs not
> > splitting
> > > > thing, really hard to argue one or the other is better. I guess the one
> > > > observation in watching us try to make good public apis over the years
> > is I
> > > > am kind of in favor of a particular kind of simple. In particular I
> > think
> > > > since the bar is sooo high in support and docs and the community of
> > users
> > > > so broad in the range of their capabilities, it makes it so there is a
> > lot
> > > > of value in dead simple interfaces that don't have a lot of conceptual
> > > > weight, don't introduce a lot of new classes or concepts or general
> > > > patterns that must be understood to use them correctly. So things like
> > > > nesting, or the Try class, or async apis, or even just a complex set of
> > > > classes representing arguments or return values kind of all stack
> > > > conceptual burdens on the user to figure out correct usage. So like,
> > for
> > > > example, the Try class is very elegant and represents a whole
> > generalized
> > > > class of possibly completed actions, but the flip side is maybe I'm
> > just a
> > > > working guy who needs to list his kafka topics but doesn't know Rust,
> > take
> > > > pity on me! :-)
> > > >
> > > > Nit picking aside, super excited to see us progress on this.
> > > >
> > > > -Jay
> > > >
> > > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <ism...@juma.me.uk> wrote:
> > > >
> > > > > Hi Jay,
> > > > >
> > > > > Thanks for the feedback. Comments inline.
> > > > >
> > > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <j...@confluent.io> wrote:
> > > > > >
> > > > > >    - I think it would be good to not use "get" as the prefix for
> > things
> > > > > >    making remote calls. We've tried to avoid the java getter
> > convention
> > > > > >    entirely (see code style guide), but for remote calls in
> > particular
> > > > it
> > > > > > kind
> > > > > >    of blurs the line between field access and remote RPC in a way
> > that
> > > > > > leads
> > > > > >    people to trouble. What about, e.g., fetchAllGroups() vs
> > > > > getAllGroups().
> > > > > >
> > > > >
> > > > > Agreed that we should avoid the `get` prefix for remote calls. There
> > are
> > > > a
> > > > > few possible prefixes for the read operations: list, fetch, describe.
> > > > >
> > > > >
> > > > > >    - I think futures and callbacks are a bit of a pain to use. I'd
> > > > second
> > > > > >    Becket's comment: let's ensure there a common use case
> > motivating
> > > > > these
> > > > > >    that wouldn't be just as easily satisfied with batch operations
> > > > (which
> > > > > > we
> > > > > >    seem to have at least for some things). In terms of flexibility
> > > > > > Callbacks >
> > > > > >    Futures > Batch Ops but I think in terms of usability it is the
> > > > exact
> > > > > >    opposite so let's make sure we have worked out how the API will
> > be
> > > > > used
> > > > > >    before deciding. In particular I think java Futures are often an
> > > > > >    uncomfortable half-way point since calling get() and blocking
> > the
> > > > > > thread is
> > > > > >    often not what you want for chaining sequences of operations in
> > a
> > > > > truly
> > > > > >    async way, so 99% of people just use the future as a way to
> > batch
> > > > > calls.
> > > > > >
> > > > >
> > > > > We should definitely figure out how the APIs are going to be used
> > before
> > > > > deciding. I agree that callbacks are definitely painful and there's
> > > > little
> > > > > reason to expose them in a modern API unless it's meant to be very
> > low
> > > > > level. When it comes to Futures, I think it's important to
> > distinguish
> > > > what
> > > > > is available in Java 7 and below versus what is available from Java 8
> > > > > onwards. CompletableFuture makes it much easier to compose/chain
> > > > operations
> > > > > (in a similar vein to java.util.Stream, our own Streams API, etc.)
> > and it
> > > > > gives you the ability to register callbacks if you really want to
> > > > (avoiding
> > > > > the somewhat odd situation in the Producer where we return a Future
> > _and_
> > > > > allow you to pass a callback).
> > > > >
> > > > >
> > > > > >    - Personally I don't think splitting the admin methods up
> > actually
> > > > > makes
> > > > > >    things more usable. It just makes you have to dig through our
> > > > > > hierarchy. I
> > > > > >    think a flat class with a bunch of operations (like the consumer
> > > > api)
> > > > > is
> > > > > >    probably the easiest for people to grok and find things on. I am
> > > > kind
> > > > > > of a
> > > > > >    dumb PHP programmer at heart, though.
> > > > > >
> > > > >
> > > > > I am not sure it's fair to compare the AdminClient with the
> > Consumer. The
> > > > > former has APIs for a bunch of unrelated APIs (topics, ACLs, configs,
> > > > > consumer groups, delegation tokens, preferred leader election,
> > partition
> > > > > reassignment, etc.) where the latter is pretty specialised. For each
> > of
> > > > the
> > > > > resources, you may have 3-4 operations, it will get confusing fast.
> > Also,
> > > > > do you really think an API that has one level of grouping will mean
> > that
> > > > > users have to "dig through our hierarchy"? Or are you concerned that
> > once
> > > > > we go in that direction, there is a danger of making the hierarchy
> > more
> > > > > complicated?
> > > > >
> > > > > Finally, I am not sure I would use the consumer as an example of
> > > > something
> > > > > that is easy to grok. :) The fact that methods behave pretty
> > differently
> > > > > (some are blocking while others only have an effect after poll) with
> > no
> > > > > indication from the type signature or naming convention makes it
> > harder,
> > > > > not easier, to understand.
> > > > >
> > > > > Ismael
> > > > >
> > > >
> >

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Reply via email to