coltmcnealy-lh commented on code in PR #14735:
URL: https://github.com/apache/kafka/pull/14735#discussion_r1392849939


##########
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java:
##########
@@ -1012,6 +1022,8 @@ private void prepareChangelogs(final Map<TaskId, Task> 
tasks,
                 // no records to restore; in this case we just initialize the 
sensor to zero
                 final long recordsToRestore = 
Math.max(changelogMetadata.restoreEndOffset - startOffset, 0L);
                 task.recordRestoration(time, recordsToRestore, true);
+            }  else if (changelogMetadata.stateManager.taskType() == 
TaskType.STANDBY && storeMetadata.endOffset() != null) {

Review Comment:
   I think the `endOffset == null` could happen when the consumer hasn't yet 
made a `poll()` for a certain partition. Which means that I think it will be 
null in most cases, honestly.
   
   So this is actually a bit of a dilemma. Since in most cases we won't know 
the end offset until we have made our first call to `poll()` (and then 
`onBatchLoaded()`), the way the code is currently written makes me think we 
will almost never have a call to `onUpdateStart()`, which kind of defeats the 
purpose of the `onUpdateStart()` callback.
   
   I see two options here:
   1.  Pass in some sentinel value when we don't know the `endOffset` upon 
initialization of the Standby Task. Sentinel value could be `-1` or `null` or 
`Optional.Empty`...apparently my team thinks I am a really crappy programmer, 
so I don't have the right to opine on which one :joy: 
   2. Remove the `endOffset` parameter from the `onUpdateStart()` method 
signature. This might require changing the KIP but I don't think it would take 
a vote.
   
   Personally, I prefer option 2). In the most common case, we won't have the 
end offset, so I wouldn't want to "mislead" someone reading the javadoc into 
thinking that they might get some info that we probably don't have.
   
   And the `onUpdateStart()` in practice is normally followed by 
`onBatchLoaded()` just a few hundred milliseconds later. The `onBatchLoaded()` 
will contain the `endOffset`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to