Guozhang, Perhaps we can discuss this in our KIP hangout next week?
Thanks, Jun On Tue, Jun 9, 2015 at 1:12 PM, Guozhang Wang <wangg...@gmail.com> wrote: > This email is to kick-off some discussion around the changes we want to > make on the new consumer APIs as well as their semantics. Here are a > not-comprehensive list of items in my mind: > > 1. Poll(timeout): current definition of timeout states "The time, in > milliseconds, spent waiting in poll if data is not available. If 0, waits > indefinitely." While in the current implementation, we have different > semantics as stated, for example: > > a) poll(timeout) can return before "timeout" elapsed with empty consumed > data. > b) poll(timeout) can return after more than "timeout" elapsed due to > blocking event like join-group, coordinator discovery, etc. > > We should think a bit more on what semantics we really want to provide and > how to provide it in implementation. > > 2. Thread safeness: currently we have a coarsen-grained locking mechanism > that provides thread safeness but blocks commit / position / etc calls > while poll() is in process. We are considering to remove the > coarsen-grained locking with an additional Consumer.wakeup() call to break > the polling, and instead suggest users to have one consumer client per > thread, which aligns with the design of a single-threaded consumer > (KAFKA-2123). > > 3. Commit(): we want to improve the async commit calls to add a callback > handler upon commit completes, and guarantee ordering of commit calls with > retry policies (KAFKA-2168). In addition, we want to extend the API to > expose attaching / fetching offset metadata stored in the Kafka offset > manager. > > 4. OffsetFetchRequest: currently for handling OffsetCommitRequest we check > the generation id and the assigned partitions before accepting the request > if the group is using Kafka for partition management, but for > OffsetFetchRequest we cannot do this checking since it does not include > groupId / consumerId information. Do people think this is OK or we should > add this as we did in OffsetCommitRequest? > > 5. New APIs: there are some other requests to add: > > a) offsetsBeforeTime(timestamp): or would seekToEnd and seekToBeginning > sufficient? > > b) listTopics(): or should we just enforce users to use AdminUtils for such > operations? > > There may be other issues that I have missed here, so folks just bring it > up if you thought about anything else. > > -- Guozhang >