[ 
https://issues.apache.org/jira/browse/KAFKA-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032525#comment-17032525
 ] 

Guozhang Wang commented on KAFKA-8307:
--------------------------------------

Thanks for the summary [~vvcephei]. I agree with you that having a verification 
mechanism that two topologies are compatible or equal. Besides that, though, 
I'm thinking that Streams itself should be robust to the user code in 
determining the ordering of the operators (and more importantly, the naming 
suffix of the operators) -- since we now have an internal logical 
representation of the topology before generating the physical Topology, we 
should make the generation process to be somehow "deterministic" such that no 
matter you write:

stream1.join(stream2)
stream1.groupBy().aggregate()

OR

stream1.groupBy().aggregate()
stream1.join(stream2)

The generated topology would be the same in terms of the operator ordering 
(today they would be different).


> Kafka Streams should provide some mechanism to determine topology equality 
> and compatibility
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8307
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8307
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>              Labels: user-experience
>
> Currently, Streams provides no mechanism to compare two topologies. This is a 
> common operation when users want to have tests verifying that they don't 
> accidentally alter their topology. They would save the known-good topology 
> and then add a unit test verifying the current code against that known-good 
> state.
> However, because there's no way to do this comparison properly, everyone is 
> reduced to using the string format of the topology (from 
> `Topology#describe().toString()`). The major drawback is that the string 
> format is meant for human consumption. It is neither machine-parseable nor 
> stable. So, these compatibility tests are doomed to fail when any minor, 
> non-breaking, change is made either to the application, or to the library. 
> This trains everyone to update the test whenever it fails, undermining its 
> utility.
> We should fix this problem, and provide both a mechanism to serialize the 
> topology and to compare two topologies for compatibility. All in all, I think 
> we need:
> # a way to serialize/deserialize topology structure in a machine-parseable 
> format that is future-compatible. Offhand, I'd recommend serializing the 
> topology structure as JSON, and establishing a policy that attributes should 
> only be added to the object graph, never removed. Note, it's out of scope to 
> be able to actually run a deserialized topology; we only want to save and 
> load the structure (not the logic) to facilitate comparisons.
> # a method to verify the *equality* of two topologies... This method tells 
> you that the two topologies are structurally identical. We can't know if the 
> logic of any operator has changed, only if the structure of the graph is 
> changed. We can consider whether other graph properties, like serdes, should 
> be included.
> # a method to verify the *compatibility* of two topologies... This method 
> tells you that moving from topology A to topology B does not require an 
> application reset. Note that this operation is not commutative: 
> `A.compatibleWith(B)` does not imply `B.compatibleWith(A)`. We can discuss 
> whether `A.compatibleWith(B) && B.compatibleWith(A)` implies `A.equals(B)` (I 
> think not necessarily, because we may want "equality" to be stricter than 
> "compatibility").



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to