guozhangwang commented on a change in pull request #8661: URL: https://github.com/apache/kafka/pull/8661#discussion_r424761219
########## File path: streams/src/main/java/org/apache/kafka/streams/state/internals/AbstractRocksDBSegmentedBytesStore.java ########## @@ -248,17 +243,6 @@ void restoreAllInternal(final Collection<KeyValue<byte[], byte[]>> records) { final long segmentId = segments.segmentId(timestamp); final S segment = segments.getOrCreateSegmentIfLive(segmentId, context, observedStreamTime); if (segment != null) { - // This handles the case that state store is moved to a new client and does not - // have the local RocksDB instance for the segment. In this case, toggleDBForBulkLoading - // will only close the database and open it again with bulk loading enabled. - if (!bulkLoadSegments.contains(segment)) { - segment.toggleDbForBulkLoading(true); - // If the store does not exist yet, the getOrCreateSegmentIfLive will call openDB that - // makes the open flag for the newly created store. - // if the store does exist already, then toggleDbForBulkLoading will make sure that - // the store is already open here. - bulkLoadSegments = new HashSet<>(segments.allSegments()); - } Review comment: Actually even for standby tasks, it should also be beneficial to use bulk-loading right (e.g. if the standby is far behind the active and has a large amount of records)? I'm thinking that in the long run, maybe we could optionally allow restore callbacks to be triggered for standby as well: we can use some simple heuristics such that if the changelog log-end offset - standby task's store offset > certain threshold, we trigger onRestoreStart(), and then we can goes back from the "sprinting" mode to normal mode after we've been close enough to the log-end offset. At the mean time, we can maybe hack a bit so that when `segment.toggleDbForBulkLoading` we set a flag and in the other we reset the flag, then during restoreAll we check the flag to decide whether enable bulk loading for newly created segment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org