GitHub user dpoldrugo opened a pull request:
https://github.com/apache/kafka/pull/2159
KAFKA 4273 - Add TTL support for RocksDB
Since Streams DSL doesn't support fine grained configurations of state
stores (it usese only RocksDB) - I have added new StreamsConfig called
`rocksdb.ttl.sec` - which allows you to set TTL for all state stores used by
the topology. To make short, if you set property to a value `>=1`, it will use
TtlDB instead of RocksDB and this will lead to records getting expired after
this defined period.
This should help users to bound their disk usage and provide a
configuration for use cases where your data has natural TTL/retention. For
example, when you process data only for one hour, and after that you don't need
the data in state stores anymore.
I have added
[test](https://github.com/apache/kafka/compare/trunk...dpoldrugo:KAFKA-4273-ttl-support?expand=1#diff-d908a80c770d196ac823752da3b3a864R117)
to check if TtlDB is expiring record, but I can't make TtlDB expire record
within a reasonable windows (1 minute). Do you have any suggestions how to
force TtlDB to expire records more quickly?
Since I'm using Kafka and Kafka Streams 0.10.1.0, I have also added this
code to the
[0.10.1](https://github.com/dpoldrugo/kafka/tree/0.10.1-KAFKA-4273-ttl-support)
branch, and if the review goes well I hope it can be added to the 0.10.1.1
release.
The patch is here:
[KAFKA_4273_Add_TTL_support_for_RocksDB_v2.patch.txt](https://github.com/apache/kafka/files/607638/KAFKA_4273_Add_TTL_support_for_RocksDB_v2.patch.txt)
**Suggestion for future work**
Since this config/feature applies to all state stores, it would be nice to
provide an API for users to configure TTL for every state store, for example
during toplogy building with KStreamBuilder.
Now: KStreamBuilder#table(String topic, final String storeName)
Suggestion: KStreamBuilder#table(String topic, final String storeName,
**_StoreOptions_ storeOptions**)
Where **_StoreOptions_** would be something like this: `{ ttlSeconds: int }`
More details: [KAFKA-4273](https://issues.apache.org/jira/browse/KAFKA-4273)
@guozhangwang @dguy @mjsax @norwood @enothereska @ijuma - could you
check this out?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dpoldrugo/kafka KAFKA-4273-ttl-support
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/2159.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2159
----
commit 5a3f1372daf2a0e939b246756c7e712e9ea21662
Author: dpoldrugo <[email protected]>
Date: 2016-11-22T21:01:02Z
KAFKA 4273 - Add TTL support for RocksDB
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---