Balaji Rao created KAFKA-14624: ---------------------------------- Summary: State restoration is broken with standby tasks and cache-enabled stores in processor API Key: KAFKA-14624 URL: https://issues.apache.org/jira/browse/KAFKA-14624 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.3.1 Reporter: Balaji Rao
I found that cache-enabled state stores in PAPI with standby tasks sometimes returns stale data when a partition moves from one app instance to another and back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a small project that I used to reproduce the issue. I dug around a bit and it seems like it's a bug in standby task state restoration when caching is enabled. If a partition moves from instance 1 to 2 and then back to instance 1, since the `CachingKeyValueStore` doesn't register a restore callback, it can return potentially stale data for non-dirty keys. I could fix the issue by modifying the `CachingKeyValueStore` to register a restore callback in which the cache restored keys are added to the cache. Is this fix in the right direction? {code:java} // register the store context.register( root, (RecordBatchingStateRestoreCallback) records -> { for (final ConsumerRecord<byte[], byte[]> record : records) { put(Bytes.wrap(record.key()), record.value()); } } ); {code} I would like to contribute a fix, if I can get some help! -- This message was sent by Atlassian Jira (v8.20.10#820010)