[jira] [Commented] (KAFKA-8295) Optimize count() using RocksDB merge operator

A. Sophie Blee-Goldman (Jira) Tue, 27 Jul 2021 14:28:08 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388304#comment-17388304
 ]


A. Sophie Blee-Goldman commented on KAFKA-8295:
-----------------------------------------------

I was just re-reading the wiki page on the Merge Operator, and now I wonder if 
it may not be _as_ helpful as I originally thought – but probably still can 
offer some improvement. Here's my take, let me know what you think.

Regardless of whether a custom MergeOperator suffers from the same performance 
impact of crossing the jni, I would bet that use cases such as list-append 
would still be more performant (since reading out an entire list, appending to 
it, and then writing the entire thing back is a lot of I/O). There are also the 
built-in, native MergeOperators that wouldn't need to cross the jni such as the 
UInt64AddOperator as you point out. So there are definitely cases where a 
MergeOperator would still outperform a RDW sequence.

The thing I didn't fully appreciate before (but seems kind of obvious now that 
I think of it lol) is that the merge() call doesn't actually return the current 
value, either before or after the merge. So if we have to know this value in 
addition to updating it, we need to do a get(), and using merge()  instead of 
RMW is only saving us the cost of `put(full_merged_value) - 
put(single_update_value)` – which means for constant-size values, like the 
unint64 unfortunately, there's pretty much no savings at all. So we don't even 
need to worry about whether/how to handle the fact that this is now a 
ValueAndTimestamp instead of a plain Value, ie a Long in the case of count(), 
because I don't think there's likely to be any performance improvement there.

I didn't realize that at the time of filing this ticket, so maybe we should 
look past the current title of this ticket. This still leaves some cases that 
could potentially benefit from even a custom MergeOperator, such as list-append 
or any other where the difference in size between the full_merged_value and the 
single_update_value is very large. So it could be worth doing a POC of 
something like this and benchmarking that for a KIP.  But tbh, having seen how 
messy it is to add new operators to the StateStore interface at the moment, I 
think we should probably try to avoid doing so unless there's good motivation 
and a clear benefit. In this case, while there may be a benefit, I'm not sure 
there's a good motivation to do so since no user has requested this feature 
yet. Of course that could just be because they aren't aware of the possibility, 
so how about this: we update the title of this ticket to describe this possible 
new feature and then see if any users chime in here or vote on the ticket. If 
we gauge real user interest then it makes more sense to put time into doing 
this. WDYT?

> Optimize count() using RocksDB merge operator
> ---------------------------------------------
>
>                 Key: KAFKA-8295
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8295
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: Sagar Rao
>            Priority: Major
>
> In addition to regular put/get/delete RocksDB provides a fourth operation, 
> merge. This essentially provides an optimized read/update/write path in a 
> single operation. One of the built-in (C++) merge operators exposed over the 
> Java API is a counter. We should be able to leverage this for a more 
> efficient implementation of count()
>  
> (Note: Unfortunately it seems unlikely we can use this to optimize general 
> aggregations, even if RocksJava allowed for a custom merge operator, unless 
> we provide a way for the user to specify and connect a C++ implemented 
> aggregator – otherwise we incur too much cost crossing the jni for a net 
> performance benefit)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8295) Optimize count() using RocksDB merge operator

Reply via email to