Hi Richard,

I made a pass over the KIP again, some more clarifications / comments:

1. seek() call itself is not blocking, only the following poll() call may
be blocking as the actually metadata rq will happen.

2. I saw you did not include Consumer.partitionFor(),
Consumer.OffsetAndTimestamp() and Consumer.listTopics() in your KIP. After
a second thought, I think this may be a better idea to not tackle them in
the same KIP, and probably we should consider whether we would change the
behavior or not in another discussion. So I agree to not include them.

3. In your wiki you mentioned "Another change shall be made to
KafkaConsumer#poll(), due to its call to updateFetchPositions() which
blocks indefinitely." This part may a bit obscure to most readers who's not
familiar with the KafkaConsumer internals, could you please add more
elaborations. More specifically, I think the root causes of the public APIs
mentioned are a bit different while the KIP's explanation sounds like they
are due to the same reason:

3.1 fetchCommittedOffsets(): this internal call will block forever if the
committed offsets cannot be fetched successfully and affect position() and
committed(). We need to break out of its internal while loop.
3.2 position() itself will while loop when offsets cannot be retrieved in
the underlying async call. We need to break out this while loop.
3.3 commitSync() passed Long.MAX_VALUE as the timeout value, we should take
the user specified timeouts when applicable.



Guozhang

On Sat, Mar 17, 2018 at 4:44 PM, Richard Yu <yohan.richard...@gmail.com>
wrote:

> Actually, what I said above is inaccurate. In
> testSeekAndCommitWithBrokerFailures, TestUtils.waitUntilTrue blocks, not
> seek.
> My assumption is that seek did not update correctly. I will be digging
> further into this.
>
>
>
> On Sat, Mar 17, 2018 at 4:16 PM, Richard Yu <yohan.richard...@gmail.com>
> wrote:
>
> > One more thing: when looking through tests, I have realized that seek()
> > methods can potentially block indefinitely. As you well know, seek() is
> > called when pollOnce() or position() is active. Thus, if position()
> blocks
> > indefinitely, then so would seek(). Should bounding seek() also be
> included
> > in this KIP?
> >
> > Thanks, Richard
> >
> > On Sat, Mar 17, 2018 at 1:16 PM, Richard Yu <yohan.richard...@gmail.com>
> > wrote:
> >
> >> Thanks for the advice, Jason
> >>
> >> I have modified KIP-266 to include the java doc for committed() and
> other
> >> blocking methods, and I also
> >> mentioned poll() which will also be bounded. Let me know if there is
> >> anything else. :)
> >>
> >> Sincerely, Richard
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Mar 17, 2018 at 12:00 PM, Jason Gustafson <ja...@confluent.io>
> >> wrote:
> >>
> >>> Hi Richard,
> >>>
> >>> Thanks for the updates. I'm really glad you picked this up. A couple
> >>> minor
> >>> comments:
> >>>
> >>> 1. Can you list the full set of new APIs explicitly in the KIP?
> >>> Currently I
> >>> only see the javadoc for `position()`.
> >>>
> >>> 2. We should consider adding `TimeUnit` to the new methods to avoid
> unit
> >>> confusion. I know it's inconsistent with the poll() API, but I think it
> >>> was
> >>> probably a mistake not to include it there, so better not to double
> down
> >>> on
> >>> that mistake. And note that we do already have `close(long, TimeUnit)`.
> >>>
> >>> Other than that, I think the current KIP seems reasonable.
> >>>
> >>> Thanks,
> >>> Jason
> >>>
> >>> On Wed, Mar 14, 2018 at 5:00 PM, Richard Yu <
> yohan.richard...@gmail.com>
> >>> wrote:
> >>>
> >>> > Note to all: I have included bounding commitSync() and committed() in
> >>> this
> >>> > KIP.
> >>> >
> >>> > On Sun, Mar 11, 2018 at 5:05 PM, Richard Yu <
> >>> yohan.richard...@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > Hi all,
> >>> > >
> >>> > > I updated the KIP where overloading position() is now the favored
> >>> > approach.
> >>> > > Bounding position() using requestTimeoutMs has been listed as
> >>> rejected.
> >>> > >
> >>> > > Any thoughts?
> >>> > >
> >>> > > On Tue, Mar 6, 2018 at 6:00 PM, Guozhang Wang <wangg...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > >> I agree that adding the overloads is most flexible. But going for
> >>> that
> >>> > >> direction we'd do that for all the blocking call that I've listed
> >>> above,
> >>> > >> with this timeout value covering the end-to-end waiting time.
> >>> > >>
> >>> > >>
> >>> > >> Guozhang
> >>> > >>
> >>> > >> On Tue, Mar 6, 2018 at 10:02 AM, Ted Yu <yuzhih...@gmail.com>
> >>> wrote:
> >>> > >>
> >>> > >> > bq. The most flexible option is to add overloads to the consumer
> >>> > >> >
> >>> > >> > This option is flexible.
> >>> > >> >
> >>> > >> > Looking at the tail of SPARK-18057, Spark dev voiced the same
> >>> choice.
> >>> > >> >
> >>> > >> > +1 for adding overload with timeout parameter.
> >>> > >> >
> >>> > >> > Cheers
> >>> > >> >
> >>> > >> > On Mon, Mar 5, 2018 at 2:42 PM, Jason Gustafson <
> >>> ja...@confluent.io>
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > @Guozhang I probably have suggested all options at some point
> or
> >>> > >> another,
> >>> > >> > > including most recently, the current KIP! I was thinking that
> >>> > >> practically
> >>> > >> > > speaking, the request timeout defines how long the user is
> >>> willing
> >>> > to
> >>> > >> > wait
> >>> > >> > > for a response. The consumer doesn't really have a complex
> send
> >>> > >> process
> >>> > >> > > like the producer for any of these APIs, so I wasn't sure how
> >>> much
> >>> > >> > benefit
> >>> > >> > > there would be from having more granular control over timeouts
> >>> (in
> >>> > the
> >>> > >> > end,
> >>> > >> > > KIP-91 just adds a single timeout to control the whole send).
> >>> That
> >>> > >> said,
> >>> > >> > it
> >>> > >> > > might indeed be better to avoid overloading the config as you
> >>> > suggest
> >>> > >> > since
> >>> > >> > > at least it avoids inconsistency with the producer's usage.
> >>> > >> > >
> >>> > >> > > The most flexible option is to add overloads to the consumer
> so
> >>> that
> >>> > >> > users
> >>> > >> > > can pass the timeout directly. I'm not sure if that is more or
> >>> less
> >>> > >> > > annoying than a new config, but I've found config timeouts a
> >>> little
> >>> > >> > > constraining in practice. For example, I could imagine users
> >>> wanting
> >>> > >> to
> >>> > >> > > wait longer for an offset commit operation than a position
> >>> lookup;
> >>> > if
> >>> > >> the
> >>> > >> > > latter isn't timely, users can just pause the partition and
> >>> continue
> >>> > >> > > fetching on others. If you cannot commit offsets, however, it
> >>> might
> >>> > be
> >>> > >> > > safer for an application to wait availability of the
> coordinator
> >>> > than
> >>> > >> > > continuing.
> >>> > >> > >
> >>> > >> > > -Jason
> >>> > >> > >
> >>> > >> > > On Sun, Mar 4, 2018 at 10:14 PM, Guozhang Wang <
> >>> wangg...@gmail.com>
> >>> > >> > wrote:
> >>> > >> > >
> >>> > >> > > > Hello Richard,
> >>> > >> > > >
> >>> > >> > > > Thanks for the proposed KIP. I have a couple of general
> >>> comments:
> >>> > >> > > >
> >>> > >> > > > 1. I'm not sure if piggy-backing the timeout exception on
> the
> >>> > >> > > > existing requestTimeoutMs configured in "request.timeout.ms
> "
> >>> is a
> >>> > >> good
> >>> > >> > > > idea
> >>> > >> > > > since a) it is a general config that applies for all types
> of
> >>> > >> requests,
> >>> > >> > > and
> >>> > >> > > > 2) using it to cover all the phases of an API call,
> including
> >>> > >> network
> >>> > >> > > round
> >>> > >> > > > trip and potential metadata refresh is shown to not be a
> good
> >>> > idea,
> >>> > >> as
> >>> > >> > > > illustrated in KIP-91:
> >>> > >> > > >
> >>> > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> > >> > > > 91+Provide+Intuitive+User+Timeouts+in+The+Producer
> >>> > >> > > >
> >>> > >> > > > In fact, I think in KAFKA-4879 which is aimed for the same
> >>> issue
> >>> > as
> >>> > >> > > > KAFKA-6608,
> >>> > >> > > > Jason has suggested we use a new config for the API. Maybe
> >>> this
> >>> > >> would
> >>> > >> > be
> >>> > >> > > a
> >>> > >> > > > more intuitive manner than reusing the request.timeout.ms
> >>> config.
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > 2. Besides the Consumer.position() call, there are a couple
> of
> >>> > more
> >>> > >> > > > blocking calls today that could result in infinite blocking:
> >>> > >> > > > Consumer.commitSync() and Consumer.committed(), should they
> be
> >>> > >> > considered
> >>> > >> > > > in this KIP as well?
> >>> > >> > > >
> >>> > >> > > > 3. There are a few other APIs that are today relying on
> >>> > >> > > request.timeout.ms
> >>> > >> > > > already for breaking the infinite blocking, namely
> >>> > >> > > Consumer.partitionFor(),
> >>> > >> > > > Consumer.OffsetAndTimestamp() and Consumer.listTopics(), if
> >>> we are
> >>> > >> > making
> >>> > >> > > > the other blocking calls to be relying a new config as
> >>> suggested
> >>> > in
> >>> > >> 1)
> >>> > >> > > > above, should we also change the semantics of these API
> >>> functions
> >>> > >> for
> >>> > >> > > > consistency?
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > Guozhang
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > On Sun, Mar 4, 2018 at 11:13 AM, Richard Yu <
> >>> > >> > yohan.richard...@gmail.com>
> >>> > >> > > > wrote:
> >>> > >> > > >
> >>> > >> > > > > Hi all,
> >>> > >> > > > >
> >>> > >> > > > > I would like to discuss a potential change which would be
> >>> made
> >>> > to
> >>> > >> > > > > KafkaConsumer:
> >>> > >> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> >>> > >> > > > action?pageId=75974886
> >>> > >> > > > >
> >>> > >> > > > > Thanks,
> >>> > >> > > > > Richard Yu
> >>> > >> > > > >
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > --
> >>> > >> > > > -- Guozhang
> >>> > >> > > >
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> --
> >>> > >> -- Guozhang
> >>> > >>
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>



-- 
-- Guozhang

Reply via email to