Souquières Adam created KAFKA-20699:
---------------------------------------
Summary: TopologyTestDriver.getStateStore corrupts the task's
record context
Key: KAFKA-20699
URL: https://issues.apache.org/jira/browse/KAFKA-20699
Project: Kafka
Issue Type: Bug
Components: streams-test-utils
Affects Versions: 4.2.1, 4.3.0, 4.2.0
Reporter: Souquières Adam
h2. Summary
{{TopologyTestDriver.getStateStore}} mutates the task's live record context as
a side effect, wiping the in-flight record's metadata.
h2. Affects
Since KAFKA-19638 ([#20403|https://github.com/apache/kafka/pull/20403]).
h2. Description
Before KAFKA-19638, the dummy record context was set once at task construction
and {{getStateStore}} was read-only.
Since that change, {{getStateStore}} unconditionally calls {{setRecordContext}}
with a dummy context (null topic, -1 offset/partition, ts 0) and never restores
it. A store lookup therefore mutates the task's live record context as a side
effect.
When a store handle is fetched while a record's context is active — e.g. an
interactive query interleaved with processing, or a test seam that resolves a
store handle through TTD from production code — the in-flight record's
{{RecordMetadata}} is wiped:
* {{recordMetadata().topic()}} returns {{null}}
* {{offset()}} and {{partition()}} become {{-1}}
Consequences:
* Code building provenance from {{recordMetadata().topic()}} NPEs.
* Direct store writes capture timestamp {{0}}.
h2. Why it's not always visible
The normal {{process()}} path masks the bug because {{doProcess}} rebuilds the
context per record. It surfaces on:
* direct store writes after a lookup, and
* any context read not preceded by a fresh {{process()}}.
h2. Proposed Fix
Only set the dummy context when none exists yet; never overwrite a live one.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)