coltmcnealy-lh commented on code in PR #14735: URL: https://github.com/apache/kafka/pull/14735#discussion_r1392849939
########## streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java: ########## @@ -1012,6 +1022,8 @@ private void prepareChangelogs(final Map<TaskId, Task> tasks, // no records to restore; in this case we just initialize the sensor to zero final long recordsToRestore = Math.max(changelogMetadata.restoreEndOffset - startOffset, 0L); task.recordRestoration(time, recordsToRestore, true); + } else if (changelogMetadata.stateManager.taskType() == TaskType.STANDBY && storeMetadata.endOffset() != null) { Review Comment: I think the `endOffset == null` could happen when the consumer hasn't yet made a `poll()` for a certain partition. Which means that I think it will be null in most cases, honestly. So this is actually a bit of a dilemma. Since in most cases we won't know the end offset until we have made our first call to `poll()` (and then `onBatchLoaded()`), the way the code is currently written makes me think we will almost never have a call to `onUpdateStart()`, which kind of defeats the purpose of the `onUpdateStart()` callback. I see two options here: 1. Pass in some sentinel value when we don't know the `endOffset` upon initialization of the Standby Task. Sentinel value could be `-1` or `null` or `Optional.Empty`...apparently my team thinks I am a really crappy programmer, so I don't have the right to opine on which one :joy: 2. Remove the `endOffset` parameter from the `onUpdateStart()` method signature. This might require changing the KIP but I don't think it would take a vote. Personally, I prefer option 2). In the most common case, we won't have the end offset, so I wouldn't want to "mislead" someone reading the javadoc into thinking that they might get some info that we probably don't have. And the `onUpdateStart()` in practice is normally followed by `onBatchLoaded()` just a few hundred milliseconds later. The `onBatchLoaded()` will contain the `endOffset`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org