[ 
https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054270#comment-17054270
 ] 

John Roesler commented on KAFKA-7224:
-------------------------------------

I didn't realized I'd left this ticket in progress. I intended to shelve this 
work until there was some concrete ask for it.

After the implementation in the PR, I ran some benchmarks, and I found that the 
performance with rocksdb-backed suppression was _absolutely terrible_ I think 
it was like two orders of magnitude slower. Much slower even than regular 
rocksdb-backed persistent store operations. The key problem was that the 
suppression buffer relies on scans, and scans in RocksDB are absurdly slow. I 
looked into rocksdb optimizations, but didn't find anything remotely promising.

It might be the case that you'd be fine with a huge performance penalty in 
exchange for the "final result" semantics, but it seems like it would have to 
be a very niche use case: low throughput (so the performance is tolerable) but 
large amounts of intermediate results (so that the in-memory buffer wouldn't be 
sufficient).

I wasn't confident that such a use case would actually exist, and on the other 
hand, it felt like a massive potential for frustration to drop such a 
poor-performing component into the codebase, even if I were to pepper the 
javadocs with warnings about it. So I decided just to pause work on it pending 
more information

-John

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Assignee: John Roesler
>            Priority: Major
>
> As described in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to