[ 
https://issues.apache.org/jira/browse/KAFKA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006460#comment-17006460
 ] 

Guozhang Wang commented on KAFKA-7041:
--------------------------------------

Today we do not trigger restoreCallback#onStart/Restore/End on updating standby 
tasks, only for restoring active tasks; when I looked at this piece of the code 
I feel maybe it is better to first look into larger (sorted) batching of 
records first, since today we would apply to the restore logic with each set of 
records that a single consumer.poll() returns, and if there are many partitions 
to fetch each partition may only return a small number of records but still we 
apply them to the store immediately, which may leads to smaller L0 files (we 
use the batching restorer for RocksDB which calls writeBatch).

A possible optimization is to wait and not apply after each polled records, cc 
[~cadonna]

> Using RocksDB bulk loading for StandbyTasks
> -------------------------------------------
>
>                 Key: KAFKA-7041
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7041
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Nikki Thean
>            Priority: Major
>
> InĀ KAFKA-5363 we introduced RocksDB bulk loading to speed up store recovery. 
> We could do the same optimization for StandbyTasks to make them more 
> efficient and to reduce the likelihood that StandbyTasks lag behind.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to