Re: [DISCUSS] KIP-333 Consider a faster form of rebalancing

Richard Yu Tue, 17 Jul 2018 01:38:15 -0700

 Hi Becket,
Thanks for reviewing this KIP. :)
I probably did not explicitly state what we were trying to avoid by introducing 
this mode. As mentioned in the KIP, there is a offset lag which could result 
after a crash. Our main goal is to avoid this lag (i.e. the latency in terms of 
time that results from the crash, not to reduce the number of records 
reprocessed).
I could provide a couple of diagrams with what I am envisioning because some 
points in my KIP might otherwise be hard to grasp (I will also include some 
diagrams to give you a better idea of an use case). As for your questions, I 
could provide a couple of answers:
1. Yes, the two consumers will in fact be processing in parallel. We do this 
because we want to accelerate the processing speed of the records to make up 
for the latency caused by the crash.
2. After the recovery point, records will not be processed twice. Let me 
describe the scenario I was envisioning: we would let the consumer that crashed 
seek to the end of the log using KafkaConsumer#seekToEnd. Meanwhile, a 
secondary consumer will start processing from the latest checkpointed offset 
and continue until it  has hit the place where the first consumer that crashed 
began processing after seekToEnd was first called. Since the consumer that 
crashed skipped from the recovery point to the end of the log, the intermediate 
offsets will be processed only by the secondary consumer. So it is important to 
note that the offset ranges which the two threads process will not overlap. 
(This is important as it prevents offsets from being processed more than once)

3. As for the committed offsets, the possibility of rewinding is not likely. If 
my understanding is correct, you are probably worried that after the crash, 
offsets that has already been previously committed will be committed again. The 
current design prevents that from happening, as the policy of where to start 
processing after a crash is universal across all Consumer instances -- we will 
begin processing from the latest offset committed. 

I hope that you at least got some of your questions answered. I will update the 
KIP soon, so please stay tuned.  

Thanks,Richard Yu
    On Tuesday, July 17, 2018, 2:14:07 PM GMT+8, Becket Qin 
<[email protected]> wrote:  

 Hi Richard,

Thanks for the KIP. I am a little confused on what is proposed. The KIP
suggests that after recovery from a consumer crash, there will be two
consumers consuming from the same partition. One consumes starting from the
log end offset at the point of recovery, and another consumes starting from
the last committed offset and keeping consuming with the first consumer in
parallel? Does that mean the messages after the recovery point will be
consumed twice? If those two consumer commits offsets, does that mean the
committed offsets may rewind?

The proposal sounds a little hacky and introduce some non-deterministic
behavior. It would be useful to have a concrete use case example to explain
what is actually needed. If the goal is to reduce the number of records
that are reprocessed when consume crashes, maybe we can have an auto commit
interval based on number of messages. If the application just wants to read
from the end of the log after recovery from crash, would calling seekToEnd
explicitly work?

Thanks,

Jiangjie (Becket) Qin

On Thu, Jul 5, 2018 at 6:46 PM, Richard Yu <[email protected]>
wrote:

> Hi all,
>
> I would like to discuss KIP-333 (which proposes a faster mode of
> rebalancing).
> Here is the link for the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 333%3A+Add+faster+mode+of+rebalancing
>
> Thanks,
> Richard Yu
>

Re: [DISCUSS] KIP-333 Consider a faster form of rebalancing

Reply via email to