I like the idea; I'll try and implement that now EDIT: Looking at this I have some more thoughts.
Why limit to just when people name the repartition topic? Since we have a graph now, we can keep a reference to the repartition graph node and at this point in the code always re-use this node for repartitioning. But this could be tricky as this will still affect an existing topology. For example, consider a user with multiple `KGroupedStream` calls where a repartition is required. While this means we have created multiple repartition topics, this also means that we have incremented the processor counter N times (N being the number of repartition topics). If we adopt this approach, and the user names the repartition topic, and we reuse the first created repartition topic, we'll change the number of all downstream operations including changelog topics and any other repartition topics. This "skipping incrementing" is similar to what happened when re-using a source topic for source `KTable` changelogs. While I realize most users will probably name all repartition topics, by doing so, they'll have to ensure they name any changelog topics as well if we reuse the repartition topics in-line. With the current optimization approach the numbering isn't affected, we move the nodes around. Additionally, I"m not sure how this will affect the current optimization approach (maybe change it, as I think if we keep repartition node references as we go we could have "automatic" partial merging ?) I'm thinking this approach is could worth looking into, but as an immediate follow-on PR to this one as this requires some thought. WDYT? [ Full content available at: https://github.com/apache/kafka/pull/5709 ] This message was relayed via gitbox.apache.org for [email protected]
