tzulitai commented on PR #52: URL: https://github.com/apache/flink-connector-kafka/pull/52#issuecomment-1766974667
@Tan-JiaLiang @mas-chen I do agree that changing the split assignment logic in the enumerator right now is too big of a change - although I do see the merit of doing that. > this can happen with any offset initializer strategy (Flink provided ones, custom implementations). I disagree here. The `LATEST_OFFSET` initializer strategy is particularly interesting specifically because `LATEST` means different things depending on when the offset is physically resolved, whereas for the end user, when using `LATEST` their intention is for this to loosely correlate to "start progress from the time of job submission". Therefore, if we don't eagerly resolve `LATEST` markers prior to first checkpoint occurring, from the users' perspective we can be skipping records if. Other strategies don't have this issue: 1. For `EARLIEST`, if there were no records initially by first snapshot, on restore we still attempt to read from the earliest position. 2. For `TIMESTAMP`, if there were no records initially by first snapshot, on restore we still attempt to read from said timestamp. 3. For `GROUP_OFFSETS`, if there were no records initially by first snapshot, on restore we still attempt to read from whatever group offsets were indicated by Kafka. 4. For `SPECIFIC_OFFSETS`, if there were no records initially by first snapshot, on restore we still attempt to read from the specified offsets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
