On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote: > Hey Colin, > > Thanks for the hard work on this. I know going back and forth on APIs is > kind of frustrating but we're at the point where these things live long > enough and are used by enough people that it is worth the pain. I'm sure > it'll come down in the right place eventually. A couple things I've found > helped in the past: > > 1. The burden of evidence needs to fall on the complicator. i.e. if > person X thinks the api should be async they need to produce a set of > common use cases that require this. Otherwise you are perpetually > having to > think "we might need x". I think it is good to have a rule of "simple > until > proven insufficient". > 2. Make sure we frame things for the intended audience. At this point > our apis get used by a very broad set of Java engineers. This is a > very > different audience from our developer mailing list. These people code > for a > living not necessarily as a passion, and may not understand details of > the > internals of our system or even basic things like multi-threaded > programming. I don't think this means we want to dumb things down, but > rather try really hard to make things truly simple when possible. > > Okay here were a couple of comments: > > 1. Conceptually what is a TopicContext? I think it means something > like > TopicAdmin? It is not literally context about Topics right? What is > the > relationship of Contexts to clients? Is there a threadsafety > difference? > Would be nice to not have to think about this, this is what I mean by > "conceptual weight". We introduce a new concept that is a bit nebulous > that > I have to figure out to use what could be a simple api. I'm sure > you've > been through this experience before where you have these various > objects > and you're trying to figure out what they represent (the connection to > the > server? the information to create a connection? a request session?).
The intention was to provide some grouping of methods, and also a place to put request parameters which were often set to defaults rather than being explicitly set. If it seems complex, we can certainly get rid of it. > 2. We've tried to avoid the Impl naming convention. In general the > rule > has been if there is only going to be one implementation you don't > need an > interface. If there will be multiple, distinguish it from the others. > The > other clients follow this pattern: Producer, KafkaProducer, > MockProducer; > Consumer, KafkaConsumer, MockConsumer. Good point. Let's change the interface to KafkaAdminClient, and the implementation to AdminClient. > 3. We generally don't use setters or getters as a naming convention. I > personally think mutating the setting in place seems kind of like late > 90s > Java style. I think it likely has thread-safety issues. i.e. even if > it is > volatile you may not get the value you just set if there is another > thread... I actually really liked what you described as your original > idea > of having a single parameter object like CreateTopicRequest that holds > all > these parameters and defaults. This lets you evolve the api with all > the > various combinations of arguments without overloading insanity. After > doing > literally tens of thousands of remote APIs at LinkedIn we eventually > converged on a rule, which is ultimately every remote api needs a > single > argument object you can add to over time and it must be batched. Which > brings me to my next point... Just to clarify, volatiles were never a part of the proposal. I think that context objects or request objects should be used by a single thread at a time. I'm not opposed to request objects, but I think they raise all the same questions as context objects. Basically, the thread-safety issues need to be spelled out and understood by the user, and the user needs more lines of code to make a request. And there will be people trying to do things like re-use request objects when they should not, and so forth. > 4. I agree batch apis are annoying but I suspect we'll end up adding > one. Doing 1000 requests for 1000 operations if creating or deleting > will > be bad, right? This won't be the common case, but when you do it it > will be > a deal-breaker problem. I don't think we should try to fix this one > behind > the scenes. > 5. Are we going to do CompletableFuture (which requires java 8) or > normal Future? Normal Future is utterly useless for most things other > than > just calling wait. If we can evolve in place from Future to > CompletableFuture that is fantastic (we could do it for the producer > too!). > My belief was that this was binary incompatible but I actually don't > know > (obviously it's source compatible). In my testing, replacing a return type with a subclass of that return type did not break binary compatibility. I haven't been able to find chapter and verse on this from the Java implementers, though. best, Colin > > -Jay > > On Wed, Feb 8, 2017 at 5:00 PM, Colin McCabe <cmcc...@apache.org> wrote: > > > Hi all, > > > > I made some major revisions to the proposal on the wiki, so please check > > it out. > > > > The new API is based on Ismael's suggestion of grouping related APIs. > > There is only one layer of grouping. I think that it's actually pretty > > intuitive. It's also based on the idea of using Futures, which several > > people commented that they'd like to see. > > > > Here's a simple example: > > > > > AdminClient client = new AdminClientImpl(myConfig); > > > try { > > > client.topics().create("foo", 3, (short) 2, false).get(); > > > Collection<String> topicNames = client.topics().list(false).get(); > > > log.info("Found topics: {}", Utils.mkString(topicNames, ", ")); > > > Collection<Node> nodes = client.nodes().list().get(); > > > log.info("Found cluster nodes: {}", Utils.mkString(nodes, ", ")); > > > } finally { > > > client.close(); > > > } > > > > The good thing is, there is no Try, no 'get' prefixes, no messing with > > batch APIs. If there is an error, then Future#get() throws an > > ExecutionException which wraps the relevant exception in the standard > > Java way. > > > > Here's a slightly less simple example: > > > > > AdminClient client = new AdminClientImpl(myConfig); > > > try { > > > List<Future<Void>> futures = new LinkedList<>(); > > > for (String topicName: myNewTopicNames) { > > > creations.add(client.topics(). > > > setClientTimeout(30000).setCreationConfig(myTopicConfig). > > > create(topicName, 3, (short) 2, false)); > > > } > > > Futures.waitForAll(futures); > > > } finally { > > > client.close(); > > > } > > > > I went with Futures because I feel like ought to have some option for > > doing async. It's a style of programming that has become a lot more > > popular with the rise of Node.js, Twisted python, etc. etc. Also, as > > Ismael commented, Java 8 CompletableFuture is going to make Java's > > support for fluent async programming a lot stronger by allowing call > > chaining and much more. > > > > If we are going to support async, the simplest thing is just to make > > everything return a future and let people call get() if they want to run > > synchronously. Having a mix of async and sync APIs is just going to be > > confusing and redundant. > > > > I think we should try to avoid creating single functions that start > > multiple requests if we can. It makes things much uglier. It means > > that you have to have some kind of request class that wraps up the > > request the user is trying to create, so that you can handle an array of > > those requests. The return value has to be something like Map<Node, > > Try<Value>> to represent which nodes failed and succeeded. This is the > > kind of stuff that, in my opinion, makes people scratch their heads. > > > > If we need to, we can still get some of the efficiency benefits of batch > > APIs by waiting for a millisecond or two before sending out a topic > > create() request to see if other create() requests arrive. If so, we > > can coalesce them. It might be better to figure out if this is an > > actual performance issue before implementing it, though. > > > > I think it would be good to get something out there, annotate it as > > @Unstable, and get feedback from people building against trunk and using > > it. We have removed or changed @Unstable APIs in streams before, so I > > don't think we should worry that it will get set in stone prematurely. > > The AdminClient API should get much less developer use than anything in > > streams, so changing an unstable API should be much easier. > > > > best, > > Colin > > > > > > On Wed, Feb 8, 2017, at 07:49, Ismael Juma wrote: > > > Thanks for elaborating Jay. I totally agree that we have to be very > > > careful > > > in how we use our complexity budget. Easier said than done when people > > > don't agree on what is complex and what is simple. :) For example, I > > > think > > > batch APIs are a significant source of complexity as you have to do a > > > bunch > > > of ceremony to group things before sending the request and error handling > > > becomes more complex due to partial failures (things like `Try` or other > > > mechanisms that serve a similar role are then needed). > > > > > > Maybe a way forward is to write API usage examples to help validate that > > > the suggested API is indeed easy to use. > > > > > > Ismael > > > > > > On Wed, Feb 8, 2017 at 4:40 AM, Jay Kreps <j...@confluent.io> wrote: > > > > > > > Totally agree on CompletableFuture. Also agree with some of the rough > > edges > > > > on the Consumer. > > > > > > > > I don't have much of a leg to stand on with the splitting vs not > > splitting > > > > thing, really hard to argue one or the other is better. I guess the one > > > > observation in watching us try to make good public apis over the years > > is I > > > > am kind of in favor of a particular kind of simple. In particular I > > think > > > > since the bar is sooo high in support and docs and the community of > > users > > > > so broad in the range of their capabilities, it makes it so there is a > > lot > > > > of value in dead simple interfaces that don't have a lot of conceptual > > > > weight, don't introduce a lot of new classes or concepts or general > > > > patterns that must be understood to use them correctly. So things like > > > > nesting, or the Try class, or async apis, or even just a complex set of > > > > classes representing arguments or return values kind of all stack > > > > conceptual burdens on the user to figure out correct usage. So like, > > for > > > > example, the Try class is very elegant and represents a whole > > generalized > > > > class of possibly completed actions, but the flip side is maybe I'm > > just a > > > > working guy who needs to list his kafka topics but doesn't know Rust, > > take > > > > pity on me! :-) > > > > > > > > Nit picking aside, super excited to see us progress on this. > > > > > > > > -Jay > > > > > > > > On Tue, Feb 7, 2017 at 3:46 PM Ismael Juma <ism...@juma.me.uk> wrote: > > > > > > > > > Hi Jay, > > > > > > > > > > Thanks for the feedback. Comments inline. > > > > > > > > > > On Tue, Feb 7, 2017 at 8:18 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > > > > > > > > - I think it would be good to not use "get" as the prefix for > > things > > > > > > making remote calls. We've tried to avoid the java getter > > convention > > > > > > entirely (see code style guide), but for remote calls in > > particular > > > > it > > > > > > kind > > > > > > of blurs the line between field access and remote RPC in a way > > that > > > > > > leads > > > > > > people to trouble. What about, e.g., fetchAllGroups() vs > > > > > getAllGroups(). > > > > > > > > > > > > > > > > Agreed that we should avoid the `get` prefix for remote calls. There > > are > > > > a > > > > > few possible prefixes for the read operations: list, fetch, describe. > > > > > > > > > > > > > > > > - I think futures and callbacks are a bit of a pain to use. I'd > > > > second > > > > > > Becket's comment: let's ensure there a common use case > > motivating > > > > > these > > > > > > that wouldn't be just as easily satisfied with batch operations > > > > (which > > > > > > we > > > > > > seem to have at least for some things). In terms of flexibility > > > > > > Callbacks > > > > > > > Futures > Batch Ops but I think in terms of usability it is the > > > > exact > > > > > > opposite so let's make sure we have worked out how the API will > > be > > > > > used > > > > > > before deciding. In particular I think java Futures are often an > > > > > > uncomfortable half-way point since calling get() and blocking > > the > > > > > > thread is > > > > > > often not what you want for chaining sequences of operations in > > a > > > > > truly > > > > > > async way, so 99% of people just use the future as a way to > > batch > > > > > calls. > > > > > > > > > > > > > > > > We should definitely figure out how the APIs are going to be used > > before > > > > > deciding. I agree that callbacks are definitely painful and there's > > > > little > > > > > reason to expose them in a modern API unless it's meant to be very > > low > > > > > level. When it comes to Futures, I think it's important to > > distinguish > > > > what > > > > > is available in Java 7 and below versus what is available from Java 8 > > > > > onwards. CompletableFuture makes it much easier to compose/chain > > > > operations > > > > > (in a similar vein to java.util.Stream, our own Streams API, etc.) > > and it > > > > > gives you the ability to register callbacks if you really want to > > > > (avoiding > > > > > the somewhat odd situation in the Producer where we return a Future > > _and_ > > > > > allow you to pass a callback). > > > > > > > > > > > > > > > > - Personally I don't think splitting the admin methods up > > actually > > > > > makes > > > > > > things more usable. It just makes you have to dig through our > > > > > > hierarchy. I > > > > > > think a flat class with a bunch of operations (like the consumer > > > > api) > > > > > is > > > > > > probably the easiest for people to grok and find things on. I am > > > > kind > > > > > > of a > > > > > > dumb PHP programmer at heart, though. > > > > > > > > > > > > > > > > I am not sure it's fair to compare the AdminClient with the > > Consumer. The > > > > > former has APIs for a bunch of unrelated APIs (topics, ACLs, configs, > > > > > consumer groups, delegation tokens, preferred leader election, > > partition > > > > > reassignment, etc.) where the latter is pretty specialised. For each > > of > > > > the > > > > > resources, you may have 3-4 operations, it will get confusing fast. > > Also, > > > > > do you really think an API that has one level of grouping will mean > > that > > > > > users have to "dig through our hierarchy"? Or are you concerned that > > once > > > > > we go in that direction, there is a danger of making the hierarchy > > more > > > > > complicated? > > > > > > > > > > Finally, I am not sure I would use the consumer as an example of > > > > something > > > > > that is easy to grok. :) The fact that methods behave pretty > > differently > > > > > (some are blocking while others only have an effect after poll) with > > no > > > > > indication from the type signature or naming convention makes it > > harder, > > > > > not easier, to understand. > > > > > > > > > > Ismael > > > > > > > > > > >