[GitHub] [kafka] vvcephei commented on issue #5012: KAFKA-4696: Streams standby task assignment should be state-store aware

GitHub Mon, 27 Aug 2018 11:25:52 -0700

Hi @ConcurrencyPractitioner,

I think my statement was a little ambiguous. What I meant was that when you're 
looking at an arbitrary method that takes `TaskId` as a parameter, or a method 
in an arbitrary class that has `TaskId`s stored in fields, and you want to use 
it as a `TaskMetadata`, you would have to cast it. How can you know whether 
it's safe to do so?


In such a context, you don't have access to the assignment protocol version. So 
you have to trace all possible code paths that could set the field/parameter in 
question, or you have to guard every single cast and maintain two code paths: 
what to do if it's really a Metadata or what to do if it's really an Id. This 
doesn't seem tractable in either case.

The purpose of the Map is simply to explicitly render the bijection between 
TaskId and TaskMetadata: namely, every TaskId maps to exactly one TaskMetadata, 
and every TaskMetadata is mapped from exactly one TaskId. This is actually the 
same property you've produced by the non-standard implementation of 
hashCode/equals on TaskMetadata.

You're absolutely right, that we get a much smaller changeset by doing the 
hashCode/equals thing. Without it, you've got to change every `Set<TaskId>` to 
a `Map<TaskId, TaskMetadata>`. But this has two advantages:
1. You get to stick with a "standard" implementation of `hashCode/equals` (or 
better yet, not override these methods at all). What I mean by "standard" here 
is that if I construct two different `TaskMetadata` instances with different 
parameters, they should not be equal to each other. This is not required by the 
Java spec, but I would rate it as *very likely* that a random Java developer 
will just auto-generate the implementations of these methods when adding a new 
field to a class. This means that doing anything fancy with these methods puts 
us at high risk for a bug in the future.
2. It makes all the data-flows that populate those fields explicit. Yes, we 
will have a large change to make right now to change all those fields to Map, 
and to update all the methods that get and set them. But, I didn't follow why 
this burden carries into the future. I see it as a one-time expense, and then 
we never have to worry about it again. Which I see as strictly better than 
having to reason about whether a nominal `TaskId` is actually a `TaskMetadata` 
every time I modify any of these classes. Does this seem reasonable?

[ Full content available at: https://github.com/apache/kafka/pull/5012 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [kafka] vvcephei commented on issue #5012: KAFKA-4696: Streams standby task assignment should be state-store aware

Reply via email to