I repost the newly changed KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms
"hudeqi" <16120...@bjtu.edu.cn>写道: > Hello, have any mates who have discussed it before seen it? Also welcome new > mates to discuss together. > > "hudeqi" <16120...@bjtu.edu.cn>写道: > > Long time no see, this issue has been discussed for a long time, now please > > allow me to summarize this issue, and then everyone can help to see which > > direction this issue should go in? > > > > There are two problems to be solved by this kip: > > 1. Solve the problem that when the client configures the > > "auto.offset.reset" to latest, the new partition data may be lost when the > > consumer resets the offset to the latest after expanding the topic > > partition. > > > > 2. In addition to the "earliest", "latest", and "none" provided by the > > existing "auto.offset.reset", it also provides more abundant parameters, > > such as "latest_on_start" (application startup is reset to latest, and an > > exception is thrown if out of range occurs), "earliest_on_start" > > (application startup is reset to earliest, and an exception is thrown if > > out of range occurs), "nearest"(determined by "auto.offset.reset" when the > > program starts, and choose earliest or latest according to the distance > > between the current offset and log start offset and log end offset when out > > of range occurs). > > > > According to the discussion results of the members above, it seems that > > there are concerns about adding these additional offset reset mechanisms: > > complexity and compatibility. In fact, these parameters do have > > corresponding benefits. Therefore, based on the above discussion results, I > > have sorted out two solution directions. You can help me to see which > > direction to follow: > > > > 1. The first one is to follow Guozhang's suggestion: keep the three > > parameters of "auto.offset.reset" and their meanings unchanged, reduce the > > confusion for Kafka users, and solve the compatibility problem by the way. > > Add these two parameters: > > a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used > > to initialize the offset. The default value is the parameter configured by > > "auto.offset.reset". If so, the strategy for initializing the offset > > remains unchanged from the previous behavior, ensuring compatibility. If > > the parameter is configured with "latest_on_start" or "earliest_on_start", > > then the offset will be reset according to the configured semantics when > > initializing the offset. In this way, the problem of data loss during > > partition expansion can be solved: configure > > "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and > > configure "auto.offset.reset" to earliest. > > b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is > > illegal or out of range occurs. The default value is the parameter > > configured by "auto.offset.reset". If so, the processing of out of range is > > the same as before to ensure compatibility. If "nearest" is configured, > > then the semantic logic corresponding to "nearest" is used only for the > > case of out of range. > > > > This solution ensures compatibility and ensures that the semantics of the > > original configuration remain unchanged. Only two incremental > > configurations are added to flexibly handle different situations. > > > > 2. The second is to directly reduce the complexity of this problem, and > > directly add the logic of resetting the initial offset of the newly > > expanded partition to the earliest to "auto.offset.reset"="latest". In this > > way, Kafka users do not need to perceive this subtle but useful change, and > > the processing of other situations remains unchanged (without considering > > too many rich offset processing mechanisms). > > > > I hope you can help me with the direction of the solution to this issue, > > thank you. > > > > Best, > > hudeqi