Hi Becket,
I made some changes and clarified the motivation for this KIP. :)It should be
easier to understand now since I included a diagram.
Thanks,Richard Yu
On Tuesday, July 17, 2018, 4:38:11 PM GMT+8, Richard Yu
<[email protected]> wrote:
Hi Becket,
Thanks for reviewing this KIP. :)
I probably did not explicitly state what we were trying to avoid by introducing
this mode. As mentioned in the KIP, there is a offset lag which could result
after a crash. Our main goal is to avoid this lag (i.e. the latency in terms of
time that results from the crash, not to reduce the number of records
reprocessed).
I could provide a couple of diagrams with what I am envisioning because some
points in my KIP might otherwise be hard to grasp (I will also include some
diagrams to give you a better idea of an use case). As for your questions, I
could provide a couple of answers:
1. Yes, the two consumers will in fact be processing in parallel. We do this
because we want to accelerate the processing speed of the records to make up
for the latency caused by the crash.
2. After the recovery point, records will not be processed twice. Let me
describe the scenario I was envisioning: we would let the consumer that crashed
seek to the end of the log using KafkaConsumer#seekToEnd. Meanwhile, a
secondary consumer will start processing from the latest checkpointed offset
and continue until it has hit the place where the first consumer that crashed
began processing after seekToEnd was first called. Since the consumer that
crashed skipped from the recovery point to the end of the log, the intermediate
offsets will be processed only by the secondary consumer. So it is important to
note that the offset ranges which the two threads process will not overlap.
(This is important as it prevents offsets from being processed more than once)
3. As for the committed offsets, the possibility of rewinding is not likely. If
my understanding is correct, you are probably worried that after the crash,
offsets that has already been previously committed will be committed again. The
current design prevents that from happening, as the policy of where to start
processing after a crash is universal across all Consumer instances -- we will
begin processing from the latest offset committed.
I hope that you at least got some of your questions answered. I will update the
KIP soon, so please stay tuned.
Thanks,Richard Yu
On Tuesday, July 17, 2018, 2:14:07 PM GMT+8, Becket Qin
<[email protected]> wrote:
Hi Richard,
Thanks for the KIP. I am a little confused on what is proposed. The KIP
suggests that after recovery from a consumer crash, there will be two
consumers consuming from the same partition. One consumes starting from the
log end offset at the point of recovery, and another consumes starting from
the last committed offset and keeping consuming with the first consumer in
parallel? Does that mean the messages after the recovery point will be
consumed twice? If those two consumer commits offsets, does that mean the
committed offsets may rewind?
The proposal sounds a little hacky and introduce some non-deterministic
behavior. It would be useful to have a concrete use case example to explain
what is actually needed. If the goal is to reduce the number of records
that are reprocessed when consume crashes, maybe we can have an auto commit
interval based on number of messages. If the application just wants to read
from the end of the log after recovery from crash, would calling seekToEnd
explicitly work?
Thanks,
Jiangjie (Becket) Qin
On Thu, Jul 5, 2018 at 6:46 PM, Richard Yu <[email protected]>
wrote:
> Hi all,
>
> I would like to discuss KIP-333 (which proposes a faster mode of
> rebalancing).
> Here is the link for the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 333%3A+Add+faster+mode+of+rebalancing
>
> Thanks,
> Richard Yu
>