tzulitai commented on PR #52:
URL: 
https://github.com/apache/flink-connector-kafka/pull/52#issuecomment-1766974667

   @Tan-JiaLiang @mas-chen 
   
   I do agree that changing the split assignment logic in the enumerator right 
now is too big of a change - although I do see the merit of doing that.
   
   > this can happen with any offset initializer strategy (Flink provided ones, 
custom implementations). 
   
   I disagree here.
   
   The `LATEST_OFFSET` initializer strategy is particularly interesting 
specifically because `LATEST` means different things depending on when the 
offset is physically resolved, whereas for the end user, when using `LATEST` 
their intention is for this to loosely correlate to "start progress from the 
time of job submission". Therefore, if we don't eagerly resolve `LATEST` 
markers prior to first checkpoint occurring, from the users' perspective we can 
be skipping records if.
   
   Other strategies don't have this issue:
   1. For `EARLIEST`, if there were no records initially by first snapshot, on 
restore we still attempt to read from the earliest position.
   2. For `TIMESTAMP`, if there were no records initially by first snapshot, on 
restore we still attempt to read from said timestamp.
   3. For `GROUP_OFFSETS`, if there were no records initially by first 
snapshot, on restore we still attempt to read from whatever group offsets were 
indicated by Kafka.
   4. For `SPECIFIC_OFFSETS`, if there were no records initially by first 
snapshot, on restore we still attempt to read from the specified offsets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to