Long time no see, this issue has been discussed for a long time, now please 
allow me to summarize this issue, and then everyone can help to see which 
direction this issue should go in?

There are two problems to be solved by this kip:
1. Solve the problem that when the client configures the "auto.offset.reset" to 
latest, the new partition data may be lost when the consumer resets the offset 
to the latest after expanding the topic partition.

2. In addition to the "earliest", "latest", and "none" provided by the existing 
"auto.offset.reset", it also provides more abundant parameters, such as 
"latest_on_start" (application startup is reset to latest, and an exception is 
thrown if out of range occurs), "earliest_on_start" (application startup is 
reset to earliest, and an exception is thrown if out of range occurs), 
"nearest"(determined by "auto.offset.reset" when the program starts, and choose 
earliest or latest according to the distance between the current offset and log 
start offset and log end offset when out of range occurs).

According to the discussion results of the members above, it seems that there 
are concerns about adding these additional offset reset mechanisms: complexity 
and compatibility. In fact, these parameters do have corresponding benefits. 
Therefore, based on the above discussion results, I have sorted out two 
solution directions. You can help me to see which direction to follow:

1. The first one is to follow Guozhang's suggestion: keep the three parameters 
of "auto.offset.reset" and their meanings unchanged, reduce the confusion for 
Kafka users, and solve the compatibility problem by the way. Add these two 
parameters:
    a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used to 
initialize the offset. The default value is the parameter configured by 
"auto.offset.reset". If so, the strategy for initializing the offset remains 
unchanged from the previous behavior, ensuring compatibility. If the parameter 
is configured with "latest_on_start" or "earliest_on_start", then the offset 
will be reset according to the configured semantics when initializing the 
offset. In this way, the problem of data loss during partition expansion can be 
solved: configure "auto.offset.reset.on.no.initial.offset" to 
"latest_on_start", and configure "auto.offset.reset" to earliest.
    b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is 
illegal or out of range occurs. The default value is the parameter configured 
by "auto.offset.reset". If so, the processing of out of range is the same as 
before to ensure compatibility. If "nearest" is configured, then the semantic 
logic corresponding to "nearest" is used only for the case of out of range.

This solution ensures compatibility and ensures that the semantics of the 
original configuration remain unchanged. Only two incremental configurations 
are added to flexibly handle different situations.

2. The second is to directly reduce the complexity of this problem, and 
directly add the logic of resetting the initial offset of the newly expanded 
partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka 
users do not need to perceive this subtle but useful change, and the processing 
of other situations remains unchanged (without considering too many rich offset 
processing mechanisms).

I hope you can help me with the direction of the solution to this issue, thank 
you.

Best,
hudeqi

Reply via email to