>
> Changing `TaskId` to an abstract class and deprecate all util functions /
> fields that we do not
> want to expose any longer


I had considered doing exactly this, where we just convert the existing
public TaskId class to an interface
or an abstract class rather than introducing a new one, but it turns out
that modifying a class in this way is
not backwards compatible (see Evolving API packages
<https://wiki.eclipse.org/Evolving_Java-based_APIs_2#:~:text=(1)%20API%20class-interface,method%20declared%20in%20a%20class.>
 and Evolving API Classes
<https://wiki.eclipse.org/Evolving_Java-based_APIs_2#Evolving_API_classes_-_API_methods_and_constructors>).
So it's off the table.

However, that still leaves the door open for reclaiming the existing TaskId
class and just deprecating the
inappropriate public APIs, then extending this with a subclass. But this
just does not seem like an appropriate
application of subclassing (to me), as that hierarchy structure is exactly
what interfaces are for. I tried to touch
on this in my last reply but admittedly did so only briefly, and rather
dismissively. I would be happy to consider
this option as well, though as is probably clear by now, it's not my
personal preference.

Thoughts?

On Sun, May 16, 2021, 9:17 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Sophie,
>
> Thanks for the nice summary. Originally I'm leaning towards option 2) since
> changing this name to others could be a pretty large conceptual change to
> users, and I think the concerns you raised is good as well that we cannot
> reuse the `taskId()` function names any more..
>
> After pondering on it a bit, what about 1) keeping `TaskId` in the current
> package as that's not our main concern anyways, 2) changing `TaskId` to an
> abstract class and deprecate all util functions / fields that we do not
> want to expose any longer, and 3) introducing a new class that extends the
> o.a.k.streams.processor.TaskId for internal implementations?
>
>
> Guozhang
>
>
> On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
> <sop...@confluent.io.invalid> wrote:
>
> > Sorry, one correction regarding the discussion on naming below: the other
> > con for reusing TaskId as the interface name is that we would not be able
> > to follow the standard getter naming conventions and introduce new
> taskId()
> > APIs to replace the deprecated ones, as the deprecated methods have
> already
> > claimed that name. So we would have to come up with a different name for
> > these APIs, for example taskInfo()  would require the TaskId interface
> > which is a bit confusing. Or using getTaskId() temporarily until the
> > deprecated methods can be removed, which goes against the established
> > convention of no "get" in the getters. Another possibility is something
> > like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
> > taskId somewhere in the name.
> >
> > Thoughts? Given this awkwardness I'm somewhat leaning towards renaming
> the
> > interface after all, as in the example above to "TaskInfo". Or we could
> > take a similar approach and just build on the original TaskId name,
> calling
> > it TaskIdInfo or something like that. Any thoughts? As always the KIP has
> > been updated with what I am proposing, so you can take a look yourself.
> >
> > On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <sop...@confluent.io
> >
> > wrote:
> >
> > > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread
> since
> > > it turned out to be more nuanced
> > > and grew in scope since I initially proposed it. Thanks John for
> > > the suggestions/questions on the [VOTE] thread,
> > > I have copied them over and addressed them below.
> > >
> > > Before that, I just want to apologize to everyone for bringing this to
> a
> > > vote immediately. I truly thought this would
> > > be a trivial KIP that just fixed what seemed like a bug in the API, but
> > it
> > > slowly grew in scope as I wrangled with
> > > the question of whether TaskId was already a public API. To clarify, it
> > is
> > > -- a TaskId is returned by the
> > > Processor/StateStoreContext.taskId() methods, I just hadn't noticed
> those
> > > APIs at the time of writing this KIP.
> > > After finding those APIs and confirming that it was public, I began to
> > > wonder  whether it really should be.
> > >
> > > 1. "Topic Group" is a vestigial term from the early days of
> > >> the codebase. We call a "topic group" a "subtopology" in the
> > >> public interface now (although "topic group" is still used
> > >> internally some places). For user-facing consistency, we
> > >> should also use "subtopologyId" in your proposal.
> > >
> > >
> > > 100% agree, I remember being very confused by the term "topic group"
> > when
> > > I was first getting to know
> > > the Streams code base, and wondering why it wasn't just called
> > > "subtopology". This would be the perfect
> > > opportunity to migrate to the more clear terminology where appropriate.
> > > I've updated the KIP to reflect this.
> > >
> > > 2. I'm wondering if it's really necessary to introduce this
> > >> interface at all. I think your motivation is to be able to
> > >> get the subtopologyId and partition via TaskMetadata, right?
> > >> Why not just add those methods to TaskMetadata? Stepping
> > >> back, the concept of metadata about an identifier is a bit
> > >> elaborate.
> > >
> > >
> > > I think if the only issue we wanted to address was the TaskMetadata
> > > returning a string representation of a
> > > TaskId rather than the real thing, as was originally the case, then I
> > > would agree with just  unfolding the API
> > > to expose the TaskId fields directly. However while digging into the
> > > question of whether TaskId was public, it
> > > became clear that the class had some awkward features for a public API:
> > > public fields with no getters, many
> > > utility methods for internal functionality that had to be made public
> and
> > > cluttered up the API, etc. So I believe
> > > this KIP can and should take the opportunity to clean up the TaskId API
> > > altogether, not just for one getter in
> > > TaskMetadata. And we can't deprecate an API without adding a
> replacement
> > > for users to migrate onto, since
> > > the concept of a TaskId itself is still very much relevant. That's the
> > > primary motivation behind introducing this
> > > new interface. Of course technically this could be done by retaining
> the
> > > existing TaskId class and just moving
> > > the things that should not be public to the new internal class, but
> > that's
> > > just a worse version of the same idea,
> > > and would require a somewhat awkward deprecation cycle. Plus, given
> that
> > > TaskId is intended to just be a
> > > wrapper for some misc information about a Task, a public interface
> feels
> > > more natural than a class.
> > >
> > > It also gives us more flexibility in the future, as it's much easier to
> > > just add a field to the TaskId (with a new
> > > getter on the interface) then to track  down all places where we return
> > > the TaskId fields, ie every topicGroupId()
> > > and partition() getters on an API, and add a new getter API for this
> > field.
> > >
> > > Please refer to the KIP for the current proposal, including all
> > deprecated
> > > and new APIs. I hope the latest
> > > iteration will make everyone happy. The only potentially controversial
> > > question remaining is how to name
> > > the new interface. Here are all the options that seem reasonable to me,
> > > though chose to go with the 2nd one:
> > >
> > > 1) Keep the name TaskId for the interface, put the interface in the
> > > o.a.k.streams package, and call the internal class
> > >     TaskIdImpl. This will require a fully qualified class in any file
> > that
> > > references both the new interface and the
> > >     deprecated public TaskId class.
> > > 2) Same as the above, but allow the internal class to also retain the
> > name
> > > TaskId. This will also require a fully qualified
> > >     class name whenever any of the three are referenced in the same
> file,
> > > though I don't think there is likely to be a single
> > >     file where the new internal TaskId class appears along with the old
> > > public TaskId class but not the new TaskId interface,
> > >     or vice versa, so the impact is probably exactly the same as the
> > above
> > > option.
> > > 3) Name the interface something else entirely (eg TaskInfo) and keep
> the
> > > name TaskId for the new internal class. The
> > >     new internal TaskId and the deprecated public TaskId class most
> > likely
> > > appear together. This option is probably the
> > >     least annoying with respect to requiring fully qualified class
> names,
> > > but I expect it's still on par with how often this
> > >     would be required in option 1 or 2.
> > >
> > > Given that both the new public interface and the deprecated public
> TaskId
> > > class are almost always going to appear together,
> > > we can't avoid fully qualified path names until the deprecated class
> can
> > > be removed, unless we give the interface a different
> > > name. And most of the time only the internal TaskId class will be
> > > referenced anyways, so the scope of fully qualified names
> > > should still be small with any option. In fact the vast majority of the
> > > codebase will be using the new internal class, so leaving it
> > > as "TaskId" will help prevent a truly massive PR, as only the import
> line
> > > will need to be changed, vs hundreds/thousands of lines
> > > in which the class name of the TaskId object appears. So I would rather
> > > avoid option 1.
> > >
> > > With option 2 we'll need to suppress a spotbugs warning, while with
> > option
> > > 3 we are changing the name of a pretty fundamental
> > > concept throughout Streams, so users will need to mentally adapt to
> > > thinking about TaskInfo (or whatever) instead of in terms
> > > of TaskId. That is why I personally prefer option 2.
> > >
> > >
> > > Thanks for the discussion, sorry this turned into such a long
> response. I
> > > think we're on the same page now but I want to make
> > > sure the background and my general thought process behind all this was
> > > clear to all. I'll kick off the vote again but please respond
> > > here with any further concerns and save the [VOTE] thread for voting
> and
> > > maybe the occasional small question.
> > >
> > > -Sophie
> > >
> >
>
>
> --
> -- Guozhang
>

Reply via email to