[GitHub] [kafka] guozhangwang commented on a change in pull request #8661: KAFKA-9603: Do not turn on bulk loading for segmented stores on stand-by tasks

GitBox Wed, 13 May 2020 15:13:42 -0700


guozhangwang commented on a change in pull request #8661:
URL: https://github.com/apache/kafka/pull/8661#discussion_r424761219




##########
File path: 
streams/src/main/java/org/apache/kafka/streams/state/internals/AbstractRocksDBSegmentedBytesStore.java
##########
@@ -248,17 +243,6 @@ void restoreAllInternal(final Collection<KeyValue<byte[], 
byte[]>> records) {
             final long segmentId = segments.segmentId(timestamp);
             final S segment = segments.getOrCreateSegmentIfLive(segmentId, 
context, observedStreamTime);
             if (segment != null) {
-                // This handles the case that state store is moved to a new 
client and does not
-                // have the local RocksDB instance for the segment. In this 
case, toggleDBForBulkLoading
-                // will only close the database and open it again with bulk 
loading enabled.
-                if (!bulkLoadSegments.contains(segment)) {
-                    segment.toggleDbForBulkLoading(true);
-                    // If the store does not exist yet, the 
getOrCreateSegmentIfLive will call openDB that
-                    // makes the open flag for the newly created store.
-                    // if the store does exist already, then 
toggleDbForBulkLoading will make sure that
-                    // the store is already open here.
-                    bulkLoadSegments = new HashSet<>(segments.allSegments());
-                }

Review comment:
       Actually even for standby tasks, it should also be beneficial to use 
bulk-loading right (e.g. if the standby is far behind the active and has a 
large amount of records)?
   
   I'm thinking that in the long run, maybe we could optionally allow restore 
callbacks to be triggered for standby as well: we can use some simple 
heuristics such that if the changelog log-end offset - standby task's store 
offset > certain threshold, we trigger onRestoreStart(), and then we can goes 
back from the "sprinting" mode to normal mode after we've been close enough to 
the log-end offset.
   
   At the mean time, we can maybe hack a bit so that when 
`segment.toggleDbForBulkLoading` we set a flag and in the other we reset the 
flag, then during restoreAll we check the flag to decide whether enable bulk 
loading for newly created segment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] guozhangwang commented on a change in pull request #8661: KAFKA-9603: Do not turn on bulk loading for segmented stores on stand-by tasks

Reply via email to