[ https://issues.apache.org/jira/browse/KAFKA-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879569#comment-16879569 ]
Matthias J. Sax commented on KAFKA-4212: ---------------------------------------- I agree about the application patterns and that it would be useful. :) {quote}I very much could be wrong, but I think rocksDB and a topic setup with `cleanup.policy=delete` & `delete.retention.ms` both work off wall-clock time {quote} It's not exactly the same: Each time a put() is done on RocksDB, the current wall-clock time is used for the put(). However, for Kafka topics, the record timestamp is used and the record timestamp can be set explicitly on send(). {quote}in particular, imagine the cache is configured with the same TTL as the topic but that the cache is offline for a couple hours, when the cache comes back online it will hold onto the values for an extra couple hours {quote} Yes, but this could also be the other way: assume your upstream pipeline writing in your input topics is offline. Hence, you don't process any data, but RocksDB would expire data even if your application is idle. Last, I personally believe, that Kafka's retention time as it is design atm is not perfect either, because it mixes wall-clock time and event-time. Also, because it's a distributes system, relying on wall-clock time that could diverge between brokers and the application it could be problematic (I and don't think about some ms-scale time skew but potentially hours or days if there is not proper time synchronization). The only thing I am saying is, that we should come up with a good design and not ship a "broken" feature. This includes the ability to keep RocksDB and the corresponding changelog topic in-sync. And because how RocksDB and the brokers work, I am not 100% sure atm how to achieve this. Maybe a similar mechanism as we use for window-store would help. In fact, we might even want to improve how window-stores expire data for the same reasons. Thanks for the details [~savulchik]! > Add a key-value store that is a TTL persistent cache > ---------------------------------------------------- > > Key: KAFKA-4212 > URL: https://issues.apache.org/jira/browse/KAFKA-4212 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 0.10.0.1 > Reporter: Elias Levy > Priority: Major > Labels: api > > Some jobs needs to maintain as state a large set of key-values for some > period of time. I.e. they need to maintain a TTL cache of values potentially > larger than memory. > Currently Kafka Streams provides non-windowed and windowed key-value stores. > Neither is an exact fit to this use case. > The {{RocksDBStore}}, a {{KeyValueStore}}, stores one value per key as > required, but does not support expiration. The TTL option of RocksDB is > explicitly not used. > The {{RocksDBWindowsStore}}, a {{WindowsStore}}, can expire items via segment > dropping, but it stores multiple items per key, based on their timestamp. > But this store can be repurposed as a cache by fetching the items in reverse > chronological order and returning the first item found. > KAFKA-2594 introduced a fixed-capacity in-memory LRU caching store, but here > we desire a variable-capacity memory-overflowing TTL caching store. > Although {{RocksDBWindowsStore}} can be repurposed as a cache, it would be > useful to have an official and proper TTL cache API and implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)