Chuckame commented on PR #14157: URL: https://github.com/apache/kafka/pull/14157#issuecomment-2147032733
After deep diving into the `KeyValueStore` and `KTableValueGetter` implementation layers, there is still one blocker that prevents the full fix: All the KTableValueGetter implementations that uses runtime mappers (based on the deserialized data) are not materialized so we don't have any backing store. It needs to serialize the mapped data on-the-fly to allow hashing the raw data, finally preventing the use of original raw data for hashing: - `KTableMap[Values]ValueGetter` - `KTableFilterValueGetter` - `KTableTransformValuesGetter` - `KTableKTableAbstractJoins` (for ktable-ktable joins on the same key) In other words, if just before the foreign-key join we make an operation like `join`/`leftJoin` (same key), `map`, `mapValues`, `filter` or `transformValues`, there is no previous raw data as it is computed on-the-fly so it has no backing store. Even if we revamp totally the raw store layer as @guozhangwang suggested, we will still have the same issue. ### Idea We could generate the hash on the original raw data **before** mapping/transforming, but this would be a breaking change as the hash will be different if a user is upgrading kafka-streams to this version (previously the hash were computed from the mapped value). This change would need a new version for `SubscriptionResponseWrapper` (currently v0). Pros: - We now have access to the original raw to bypass the deserialization step - We gain in performances as we do not `deserialize -> transform -> serialize -> hash` but just `hash` Cons: - Breaking change for actual hashes, users need to empty the stores or all the events triggered by the right side will be skipped as the hash will be always different (current bug that we have actually), the reason why we need to introduce the version 1 of `SubscriptionResponseWrapper` Would you allow this breaking change @mjsax ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org