[ https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054270#comment-17054270 ]
John Roesler commented on KAFKA-7224: ------------------------------------- I didn't realized I'd left this ticket in progress. I intended to shelve this work until there was some concrete ask for it. After the implementation in the PR, I ran some benchmarks, and I found that the performance with rocksdb-backed suppression was _absolutely terrible_ I think it was like two orders of magnitude slower. Much slower even than regular rocksdb-backed persistent store operations. The key problem was that the suppression buffer relies on scans, and scans in RocksDB are absurdly slow. I looked into rocksdb optimizations, but didn't find anything remotely promising. It might be the case that you'd be fine with a huge performance penalty in exchange for the "final result" semantics, but it seems like it would have to be a very niche use case: low throughput (so the performance is tolerable) but large amounts of intermediate results (so that the in-memory buffer wouldn't be sufficient). I wasn't confident that such a use case would actually exist, and on the other hand, it felt like a massive potential for frustration to drop such a poor-performing component into the codebase, even if I were to pepper the javadocs with warnings about it. So I decided just to pause work on it pending more information -John > KIP-328: Add spill-to-disk for Suppression > ------------------------------------------ > > Key: KAFKA-7224 > URL: https://issues.apache.org/jira/browse/KAFKA-7224 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: John Roesler > Assignee: John Roesler > Priority: Major > > As described in > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables] > Following on KAFKA-7223, implement the spill-to-disk buffering strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)