[ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024902#comment-17024902 ]
Navinder Brar commented on KAFKA-9450: -------------------------------------- Or do you suggest to never (ie, for EOS and non-EOS case) call `innerByteStore#flush()`? This might be possible but would have a negative impact on non-EOS as it would make current fault-tolerance mechanism for non-EOS less efficient (we would not have a guarantee on commit that data is flushed to disk and might need to recover more data from the changelog topic in case of failure). >>> [~mjsax] do you mean we still write to checkpoint file(for non-EOS) on >>> every commit but remove flush. That would be dangerous right? As if the >>> data from Rockdsdb is not flushed for the checkpoints that have been >>> written in checkpoint file we have lost the data and moved ahead as well. Can we add event listeners on Rocksdb(EventListener::OnFlushCompleted()) and whenever a particular store is flushed, commit the checkpoint for that particular store(changelog) in the checkpoint file. Currently, we are overriding most performance-based Rocksdb configs(memtable size, max writer buffers) by making the commit based on time. If this seems reasonable, I can work on this. > Decouple inner state flushing from committing with EOS > ------------------------------------------------------ > > Key: KAFKA-9450 > URL: https://issues.apache.org/jira/browse/KAFKA-9450 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Sophie Blee-Goldman > Priority: Major > > When EOS is turned on, the commit interval is set quite low (100ms) and all > the store layers are flushed during a commit. This is necessary for > forwarding records in the cache to the changelog, but unfortunately also > forces rocksdb to flush the current memtable before it's full. The result is > a large number of small writes to disk, losing the benefits of batching, and > a large number of very small L0 files that are likely to slow compaction. > Since we have to delete the stores to recreate from scratch anyways during an > unclean shutdown with EOS, we may as well skip flushing the innermost > StateStore during a commit and only do so during a graceful shutdown, before > a rebalance, etc. This is currently blocked on a refactoring of the state > store layers to allow decoupling the flush of the caching layer from the > actual state store. -- This message was sent by Atlassian Jira (v8.3.4#803005)