Hello, have any mates who have discussed it before seen it? Also welcome new mates to discuss together.
"hudeqi" <16120...@bjtu.edu.cn>写道: > Long time no see, this issue has been discussed for a long time, now please > allow me to summarize this issue, and then everyone can help to see which > direction this issue should go in? > > There are two problems to be solved by this kip: > 1. Solve the problem that when the client configures the "auto.offset.reset" > to latest, the new partition data may be lost when the consumer resets the > offset to the latest after expanding the topic partition. > > 2. In addition to the "earliest", "latest", and "none" provided by the > existing "auto.offset.reset", it also provides more abundant parameters, such > as "latest_on_start" (application startup is reset to latest, and an > exception is thrown if out of range occurs), "earliest_on_start" (application > startup is reset to earliest, and an exception is thrown if out of range > occurs), "nearest"(determined by "auto.offset.reset" when the program starts, > and choose earliest or latest according to the distance between the current > offset and log start offset and log end offset when out of range occurs). > > According to the discussion results of the members above, it seems that there > are concerns about adding these additional offset reset mechanisms: > complexity and compatibility. In fact, these parameters do have corresponding > benefits. Therefore, based on the above discussion results, I have sorted out > two solution directions. You can help me to see which direction to follow: > > 1. The first one is to follow Guozhang's suggestion: keep the three > parameters of "auto.offset.reset" and their meanings unchanged, reduce the > confusion for Kafka users, and solve the compatibility problem by the way. > Add these two parameters: > a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used > to initialize the offset. The default value is the parameter configured by > "auto.offset.reset". If so, the strategy for initializing the offset remains > unchanged from the previous behavior, ensuring compatibility. If the > parameter is configured with "latest_on_start" or "earliest_on_start", then > the offset will be reset according to the configured semantics when > initializing the offset. In this way, the problem of data loss during > partition expansion can be solved: configure > "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and configure > "auto.offset.reset" to earliest. > b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is > illegal or out of range occurs. The default value is the parameter configured > by "auto.offset.reset". If so, the processing of out of range is the same as > before to ensure compatibility. If "nearest" is configured, then the semantic > logic corresponding to "nearest" is used only for the case of out of range. > > This solution ensures compatibility and ensures that the semantics of the > original configuration remain unchanged. Only two incremental configurations > are added to flexibly handle different situations. > > 2. The second is to directly reduce the complexity of this problem, and > directly add the logic of resetting the initial offset of the newly expanded > partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka > users do not need to perceive this subtle but useful change, and the > processing of other situations remains unchanged (without considering too > many rich offset processing mechanisms). > > I hope you can help me with the direction of the solution to this issue, > thank you. > > Best, > hudeqi