[ https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388304#comment-17388304 ]
A. Sophie Blee-Goldman commented on KAFKA-8295: ----------------------------------------------- I was just re-reading the wiki page on the Merge Operator, and now I wonder if it may not be _as_ helpful as I originally thought – but probably still can offer some improvement. Here's my take, let me know what you think. Regardless of whether a custom MergeOperator suffers from the same performance impact of crossing the jni, I would bet that use cases such as list-append would still be more performant (since reading out an entire list, appending to it, and then writing the entire thing back is a lot of I/O). There are also the built-in, native MergeOperators that wouldn't need to cross the jni such as the UInt64AddOperator as you point out. So there are definitely cases where a MergeOperator would still outperform a RDW sequence. The thing I didn't fully appreciate before (but seems kind of obvious now that I think of it lol) is that the merge() call doesn't actually return the current value, either before or after the merge. So if we have to know this value in addition to updating it, we need to do a get(), and using merge() instead of RMW is only saving us the cost of `put(full_merged_value) - put(single_update_value)` – which means for constant-size values, like the unint64 unfortunately, there's pretty much no savings at all. So we don't even need to worry about whether/how to handle the fact that this is now a ValueAndTimestamp instead of a plain Value, ie a Long in the case of count(), because I don't think there's likely to be any performance improvement there. I didn't realize that at the time of filing this ticket, so maybe we should look past the current title of this ticket. This still leaves some cases that could potentially benefit from even a custom MergeOperator, such as list-append or any other where the difference in size between the full_merged_value and the single_update_value is very large. So it could be worth doing a POC of something like this and benchmarking that for a KIP. But tbh, having seen how messy it is to add new operators to the StateStore interface at the moment, I think we should probably try to avoid doing so unless there's good motivation and a clear benefit. In this case, while there may be a benefit, I'm not sure there's a good motivation to do so since no user has requested this feature yet. Of course that could just be because they aren't aware of the possibility, so how about this: we update the title of this ticket to describe this possible new feature and then see if any users chime in here or vote on the ticket. If we gauge real user interest then it makes more sense to put time into doing this. WDYT? > Optimize count() using RocksDB merge operator > --------------------------------------------- > > Key: KAFKA-8295 > URL: https://issues.apache.org/jira/browse/KAFKA-8295 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Assignee: Sagar Rao > Priority: Major > > In addition to regular put/get/delete RocksDB provides a fourth operation, > merge. This essentially provides an optimized read/update/write path in a > single operation. One of the built-in (C++) merge operators exposed over the > Java API is a counter. We should be able to leverage this for a more > efficient implementation of count() > > (Note: Unfortunately it seems unlikely we can use this to optimize general > aggregations, even if RocksJava allowed for a custom merge operator, unless > we provide a way for the user to specify and connect a C++ implemented > aggregator – otherwise we incur too much cost crossing the jni for a net > performance benefit) -- This message was sent by Atlassian Jira (v8.3.4#803005)