> > Changing `TaskId` to an abstract class and deprecate all util functions / > fields that we do not > want to expose any longer
I had considered doing exactly this, where we just convert the existing public TaskId class to an interface or an abstract class rather than introducing a new one, but it turns out that modifying a class in this way is not backwards compatible (see Evolving API packages <https://wiki.eclipse.org/Evolving_Java-based_APIs_2#:~:text=(1)%20API%20class-interface,method%20declared%20in%20a%20class.> and Evolving API Classes <https://wiki.eclipse.org/Evolving_Java-based_APIs_2#Evolving_API_classes_-_API_methods_and_constructors>). So it's off the table. However, that still leaves the door open for reclaiming the existing TaskId class and just deprecating the inappropriate public APIs, then extending this with a subclass. But this just does not seem like an appropriate application of subclassing (to me), as that hierarchy structure is exactly what interfaces are for. I tried to touch on this in my last reply but admittedly did so only briefly, and rather dismissively. I would be happy to consider this option as well, though as is probably clear by now, it's not my personal preference. Thoughts? On Sun, May 16, 2021, 9:17 PM Guozhang Wang <wangg...@gmail.com> wrote: > Sophie, > > Thanks for the nice summary. Originally I'm leaning towards option 2) since > changing this name to others could be a pretty large conceptual change to > users, and I think the concerns you raised is good as well that we cannot > reuse the `taskId()` function names any more.. > > After pondering on it a bit, what about 1) keeping `TaskId` in the current > package as that's not our main concern anyways, 2) changing `TaskId` to an > abstract class and deprecate all util functions / fields that we do not > want to expose any longer, and 3) introducing a new class that extends the > o.a.k.streams.processor.TaskId for internal implementations? > > > Guozhang > > > On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman > <sop...@confluent.io.invalid> wrote: > > > Sorry, one correction regarding the discussion on naming below: the other > > con for reusing TaskId as the interface name is that we would not be able > > to follow the standard getter naming conventions and introduce new > taskId() > > APIs to replace the deprecated ones, as the deprecated methods have > already > > claimed that name. So we would have to come up with a different name for > > these APIs, for example taskInfo() would require the TaskId interface > > which is a bit confusing. Or using getTaskId() temporarily until the > > deprecated methods can be removed, which goes against the established > > convention of no "get" in the getters. Another possibility is something > > like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning > > taskId somewhere in the name. > > > > Thoughts? Given this awkwardness I'm somewhat leaning towards renaming > the > > interface after all, as in the example above to "TaskInfo". Or we could > > take a similar approach and just build on the original TaskId name, > calling > > it TaskIdInfo or something like that. Any thoughts? As always the KIP has > > been updated with what I am proposing, so you can take a look yourself. > > > > On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <sop...@confluent.io > > > > wrote: > > > > > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread > since > > > it turned out to be more nuanced > > > and grew in scope since I initially proposed it. Thanks John for > > > the suggestions/questions on the [VOTE] thread, > > > I have copied them over and addressed them below. > > > > > > Before that, I just want to apologize to everyone for bringing this to > a > > > vote immediately. I truly thought this would > > > be a trivial KIP that just fixed what seemed like a bug in the API, but > > it > > > slowly grew in scope as I wrangled with > > > the question of whether TaskId was already a public API. To clarify, it > > is > > > -- a TaskId is returned by the > > > Processor/StateStoreContext.taskId() methods, I just hadn't noticed > those > > > APIs at the time of writing this KIP. > > > After finding those APIs and confirming that it was public, I began to > > > wonder whether it really should be. > > > > > > 1. "Topic Group" is a vestigial term from the early days of > > >> the codebase. We call a "topic group" a "subtopology" in the > > >> public interface now (although "topic group" is still used > > >> internally some places). For user-facing consistency, we > > >> should also use "subtopologyId" in your proposal. > > > > > > > > > 100% agree, I remember being very confused by the term "topic group" > > when > > > I was first getting to know > > > the Streams code base, and wondering why it wasn't just called > > > "subtopology". This would be the perfect > > > opportunity to migrate to the more clear terminology where appropriate. > > > I've updated the KIP to reflect this. > > > > > > 2. I'm wondering if it's really necessary to introduce this > > >> interface at all. I think your motivation is to be able to > > >> get the subtopologyId and partition via TaskMetadata, right? > > >> Why not just add those methods to TaskMetadata? Stepping > > >> back, the concept of metadata about an identifier is a bit > > >> elaborate. > > > > > > > > > I think if the only issue we wanted to address was the TaskMetadata > > > returning a string representation of a > > > TaskId rather than the real thing, as was originally the case, then I > > > would agree with just unfolding the API > > > to expose the TaskId fields directly. However while digging into the > > > question of whether TaskId was public, it > > > became clear that the class had some awkward features for a public API: > > > public fields with no getters, many > > > utility methods for internal functionality that had to be made public > and > > > cluttered up the API, etc. So I believe > > > this KIP can and should take the opportunity to clean up the TaskId API > > > altogether, not just for one getter in > > > TaskMetadata. And we can't deprecate an API without adding a > replacement > > > for users to migrate onto, since > > > the concept of a TaskId itself is still very much relevant. That's the > > > primary motivation behind introducing this > > > new interface. Of course technically this could be done by retaining > the > > > existing TaskId class and just moving > > > the things that should not be public to the new internal class, but > > that's > > > just a worse version of the same idea, > > > and would require a somewhat awkward deprecation cycle. Plus, given > that > > > TaskId is intended to just be a > > > wrapper for some misc information about a Task, a public interface > feels > > > more natural than a class. > > > > > > It also gives us more flexibility in the future, as it's much easier to > > > just add a field to the TaskId (with a new > > > getter on the interface) then to track down all places where we return > > > the TaskId fields, ie every topicGroupId() > > > and partition() getters on an API, and add a new getter API for this > > field. > > > > > > Please refer to the KIP for the current proposal, including all > > deprecated > > > and new APIs. I hope the latest > > > iteration will make everyone happy. The only potentially controversial > > > question remaining is how to name > > > the new interface. Here are all the options that seem reasonable to me, > > > though chose to go with the 2nd one: > > > > > > 1) Keep the name TaskId for the interface, put the interface in the > > > o.a.k.streams package, and call the internal class > > > TaskIdImpl. This will require a fully qualified class in any file > > that > > > references both the new interface and the > > > deprecated public TaskId class. > > > 2) Same as the above, but allow the internal class to also retain the > > name > > > TaskId. This will also require a fully qualified > > > class name whenever any of the three are referenced in the same > file, > > > though I don't think there is likely to be a single > > > file where the new internal TaskId class appears along with the old > > > public TaskId class but not the new TaskId interface, > > > or vice versa, so the impact is probably exactly the same as the > > above > > > option. > > > 3) Name the interface something else entirely (eg TaskInfo) and keep > the > > > name TaskId for the new internal class. The > > > new internal TaskId and the deprecated public TaskId class most > > likely > > > appear together. This option is probably the > > > least annoying with respect to requiring fully qualified class > names, > > > but I expect it's still on par with how often this > > > would be required in option 1 or 2. > > > > > > Given that both the new public interface and the deprecated public > TaskId > > > class are almost always going to appear together, > > > we can't avoid fully qualified path names until the deprecated class > can > > > be removed, unless we give the interface a different > > > name. And most of the time only the internal TaskId class will be > > > referenced anyways, so the scope of fully qualified names > > > should still be small with any option. In fact the vast majority of the > > > codebase will be using the new internal class, so leaving it > > > as "TaskId" will help prevent a truly massive PR, as only the import > line > > > will need to be changed, vs hundreds/thousands of lines > > > in which the class name of the TaskId object appears. So I would rather > > > avoid option 1. > > > > > > With option 2 we'll need to suppress a spotbugs warning, while with > > option > > > 3 we are changing the name of a pretty fundamental > > > concept throughout Streams, so users will need to mentally adapt to > > > thinking about TaskInfo (or whatever) instead of in terms > > > of TaskId. That is why I personally prefer option 2. > > > > > > > > > Thanks for the discussion, sorry this turned into such a long > response. I > > > think we're on the same page now but I want to make > > > sure the background and my general thought process behind all this was > > > clear to all. I'll kick off the vote again but please respond > > > here with any further concerns and save the [VOTE] thread for voting > and > > > maybe the occasional small question. > > > > > > -Sophie > > > > > > > > -- > -- Guozhang >