vvcephei commented on code in PR #13364: URL: https://github.com/apache/kafka/pull/13364#discussion_r1153437348
########## streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBVersionedStoreSegmentValueFormatter.java: ########## @@ -495,6 +501,41 @@ private boolean isLastIndex(final int index) { return unpackedReversedTimestampAndValueSizes.get(index).timestamp == minTimestamp; } + private void truncateRecordsToTimestamp(final long timestamp) { + if (timestamp <= minTimestamp) { + // delete everything in this current segment by replacing it with a degenerate segment + initializeWithRecord(new ValueAndValueSize(null), timestamp, timestamp); + return; + } + + final SegmentSearchResult searchResult = find(timestamp, false); + // all records with later timestamps should be removed + int fullRecordsToTruncate = searchResult.index(); Review Comment: Agreed; it's a tricky question. It seems like either way, we need to clearly document/enforce the contract so that future generations can have a reasonably good chance of not breaking the system. Ideally, there would be at one test verifying the contract is followed (eg, the order of inserts into the underlying segments), so that when someone breaks it, they'll also break the test, which can have a name and comment explaining the contract. The next-best thing would be to structure the code in a way that's pretty self-explanatory, though a sequence of refactors can destroy this property without any specific step being obviously wrong. I don't think I like the idea of just throwing an IllegalStateException in production code. It puts the user in a really bad position where they've upgraded to a new version that is now crashing in production, and they have to decide whether it's safe to downgrade or whether they need to wait for a fix, and all the while they're incurring downtime. Note that the IllegalState condition may take a while to manifest; imagine what you'd be going through if your app started crashing with this error a month after your last upgrade. OTOH, we do have recoverable exceptions in Streams, where you can signal to the runtime that the task is corrupted, triggering the purge-and-rebuild-from-changelog automatically. Perhaps that, coupled with an ERROR log telling the user to file a bug ticket, would be the best path forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org