Matthias J. Sax created KAFKA-7293:
--------------------------------------
Summary: Merge followed by groupByKey/join might violate
co-partioning
Key: KAFKA-7293
URL: https://issues.apache.org/jira/browse/KAFKA-7293
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Matthias J. Sax
The merge() operations can be applied to input KStreams that have a different
number of tasks (ie, input topic partitions). For this case, the input topics
are not co-partitioned and thus the result KStream is not partitioned even if
each input KStream is partitioned by its own.
Because, no "repartitionRequired" flag is set on the input KStreams, the flag
is also not set on the output KStream. Hence, if a groupByKey() or join()
operation is applied the output KStream, we don't insert a repartition topic.
However, repartitioning would be required because the KStream is not
partitioned.
We cannot detect this during compile time, because the number or partitions is
unknown, and thus, we cannot decide if repartitioning is required or not.
However, we can add a runtime check similar to joins() that checks if data is
correctly (co-)partitioned and if not, we can raise a runtime exception.
Note, for merge() in contrast to join(), we should only check for
co-partitioning, if the merge() is followed by a groupByKey() or join()
operations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)