I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since it turned out to be more nuanced and grew in scope since I initially proposed it. Thanks John for the suggestions/questions on the [VOTE] thread, I have copied them over and addressed them below.
Before that, I just want to apologize to everyone for bringing this to a vote immediately. I truly thought this would be a trivial KIP that just fixed what seemed like a bug in the API, but it slowly grew in scope as I wrangled with the question of whether TaskId was already a public API. To clarify, it is -- a TaskId is returned by the Processor/StateStoreContext.taskId() methods, I just hadn't noticed those APIs at the time of writing this KIP. After finding those APIs and confirming that it was public, I began to wonder whether it really should be. 1. "Topic Group" is a vestigial term from the early days of > the codebase. We call a "topic group" a "subtopology" in the > public interface now (although "topic group" is still used > internally some places). For user-facing consistency, we > should also use "subtopologyId" in your proposal. 100% agree, I remember being very confused by the term "topic group" when I was first getting to know the Streams code base, and wondering why it wasn't just called "subtopology". This would be the perfect opportunity to migrate to the more clear terminology where appropriate. I've updated the KIP to reflect this. 2. I'm wondering if it's really necessary to introduce this > interface at all. I think your motivation is to be able to > get the subtopologyId and partition via TaskMetadata, right? > Why not just add those methods to TaskMetadata? Stepping > back, the concept of metadata about an identifier is a bit > elaborate. I think if the only issue we wanted to address was the TaskMetadata returning a string representation of a TaskId rather than the real thing, as was originally the case, then I would agree with just unfolding the API to expose the TaskId fields directly. However while digging into the question of whether TaskId was public, it became clear that the class had some awkward features for a public API: public fields with no getters, many utility methods for internal functionality that had to be made public and cluttered up the API, etc. So I believe this KIP can and should take the opportunity to clean up the TaskId API altogether, not just for one getter in TaskMetadata. And we can't deprecate an API without adding a replacement for users to migrate onto, since the concept of a TaskId itself is still very much relevant. That's the primary motivation behind introducing this new interface. Of course technically this could be done by retaining the existing TaskId class and just moving the things that should not be public to the new internal class, but that's just a worse version of the same idea, and would require a somewhat awkward deprecation cycle. Plus, given that TaskId is intended to just be a wrapper for some misc information about a Task, a public interface feels more natural than a class. It also gives us more flexibility in the future, as it's much easier to just add a field to the TaskId (with a new getter on the interface) then to track down all places where we return the TaskId fields, ie every topicGroupId() and partition() getters on an API, and add a new getter API for this field. Please refer to the KIP for the current proposal, including all deprecated and new APIs. I hope the latest iteration will make everyone happy. The only potentially controversial question remaining is how to name the new interface. Here are all the options that seem reasonable to me, though chose to go with the 2nd one: 1) Keep the name TaskId for the interface, put the interface in the o.a.k.streams package, and call the internal class TaskIdImpl. This will require a fully qualified class in any file that references both the new interface and the deprecated public TaskId class. 2) Same as the above, but allow the internal class to also retain the name TaskId. This will also require a fully qualified class name whenever any of the three are referenced in the same file, though I don't think there is likely to be a single file where the new internal TaskId class appears along with the old public TaskId class but not the new TaskId interface, or vice versa, so the impact is probably exactly the same as the above option. 3) Name the interface something else entirely (eg TaskInfo) and keep the name TaskId for the new internal class. The new internal TaskId and the deprecated public TaskId class most likely appear together. This option is probably the least annoying with respect to requiring fully qualified class names, but I expect it's still on par with how often this would be required in option 1 or 2. Given that both the new public interface and the deprecated public TaskId class are almost always going to appear together, we can't avoid fully qualified path names until the deprecated class can be removed, unless we give the interface a different name. And most of the time only the internal TaskId class will be referenced anyways, so the scope of fully qualified names should still be small with any option. In fact the vast majority of the codebase will be using the new internal class, so leaving it as "TaskId" will help prevent a truly massive PR, as only the import line will need to be changed, vs hundreds/thousands of lines in which the class name of the TaskId object appears. So I would rather avoid option 1. With option 2 we'll need to suppress a spotbugs warning, while with option 3 we are changing the name of a pretty fundamental concept throughout Streams, so users will need to mentally adapt to thinking about TaskInfo (or whatever) instead of in terms of TaskId. That is why I personally prefer option 2. Thanks for the discussion, sorry this turned into such a long response. I think we're on the same page now but I want to make sure the background and my general thought process behind all this was clear to all. I'll kick off the vote again but please respond here with any further concerns and save the [VOTE] thread for voting and maybe the occasional small question. -Sophie