[ 
https://issues.apache.org/jira/browse/KAFKA-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-13524:
---------------------------------
    Description: 
The Record Cache in Kafka Streams is more properly termed a write buffer, since 
it only caches writes, not reads, and its intent is to buffer the writes before 
flushing them in bulk into lower store layers.

Unlike scan-type queries, which require scanning both the record cache and the 
underlying store and collating the results, the KeyQuery (and any other point 
lookup) can straightforwardly be served from the record cache if it is buffered 
or fall through to the underlying store if not.

In contrast to scan-type operations, benchmarks reveal that key-based cache 
reads are faster than always skipping the cache as well.

Therefore, it makes sense to implement a handler in the CachingKeyValueStore 
for the KeyQuery specifically in order to serve fresher key-based lookups. Scan 
queries may also be useful, but their less flattering performance profile makes 
it reasonable to leave them for follow-on work.

We could add an option to disable cache reads on the KeyQuery, but since they 
seem to be always better, I'm leaning toward just unilaterally serving cached 
records if they exist.

 

I did a quick POC of this: 
[https://github.com/vvcephei/kafka/pull/new/iqv2-poc-cache-queries]

 

The internal code of the caching stores should be refactored to share logic 
with the regular store methods. Scan queries will be more complicated, since 
they require merging the cache with the wrapped result.

There is a bug related to that non-timestamped-store-serde hack (see the 
failing test when you run IQv2StoreIntegrationTest). Even though the inner 
store is not timestamped, the cache returns a timestamped value. We'll have to 
discuss options to fix it.

  was:
I did a quick POC of this: 
[https://github.com/vvcephei/kafka/pull/new/iqv2-poc-cache-queries]

 

 

The internal code of the caching stores should be refactored to share logic 
with the regular store methods. Scan queries will be more complicated, since 
they require merging the cache with the wrapped result.

There is a bug related to that non-timestamped-store-serde hack (see the 
failing test when you run IQv2StoreIntegrationTest). Even though the inner 
store is not timestamped, the cache returns a timestamped value. We'll have to 
discuss options to fix it.


> IQv2: Implement KeyQuery from the RecordCache
> ---------------------------------------------
>
>                 Key: KAFKA-13524
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13524
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: John Roesler
>            Priority: Major
>
> The Record Cache in Kafka Streams is more properly termed a write buffer, 
> since it only caches writes, not reads, and its intent is to buffer the 
> writes before flushing them in bulk into lower store layers.
> Unlike scan-type queries, which require scanning both the record cache and 
> the underlying store and collating the results, the KeyQuery (and any other 
> point lookup) can straightforwardly be served from the record cache if it is 
> buffered or fall through to the underlying store if not.
> In contrast to scan-type operations, benchmarks reveal that key-based cache 
> reads are faster than always skipping the cache as well.
> Therefore, it makes sense to implement a handler in the CachingKeyValueStore 
> for the KeyQuery specifically in order to serve fresher key-based lookups. 
> Scan queries may also be useful, but their less flattering performance 
> profile makes it reasonable to leave them for follow-on work.
> We could add an option to disable cache reads on the KeyQuery, but since they 
> seem to be always better, I'm leaning toward just unilaterally serving cached 
> records if they exist.
>  
> I did a quick POC of this: 
> [https://github.com/vvcephei/kafka/pull/new/iqv2-poc-cache-queries]
>  
> The internal code of the caching stores should be refactored to share logic 
> with the regular store methods. Scan queries will be more complicated, since 
> they require merging the cache with the wrapped result.
> There is a bug related to that non-timestamped-store-serde hack (see the 
> failing test when you run IQv2StoreIntegrationTest). Even though the inner 
> store is not timestamped, the cache returns a timestamped value. We'll have 
> to discuss options to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to