Sophie Blee-Goldman created KAFKA-8627:
------------------------------------------

             Summary: Investigate batching on state restore
                 Key: KAFKA-8627
                 URL: https://issues.apache.org/jira/browse/KAFKA-8627
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Sophie Blee-Goldman


Currently when rebuilding state from scratch, we form batches based on whatever 
is returned by poll() and write them to RocksDB. Given the structure of 
RocksDB, inserting large sorted batches gives the best performance when writing 
large amounts of data at once, so we should investigate the potential 
restore-time improvement of 

1) Larger batches – either by tuning the restore consumer to return larger 
amounts of data, buffering records into larger batches, or both

2) Sorting batches 

 

These two factors are likely to be coupled, so we should explore the 
performance gains/hits by varying both if possible (ie turn sorting on/off with 
a variety of batch sizes) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to