Sorry, one correction regarding the discussion on naming below: the other
con for reusing TaskId as the interface name is that we would not be able
to follow the standard getter naming conventions and introduce new taskId()
APIs to replace the deprecated ones, as the deprecated methods have already
claimed that name. So we would have to come up with a different name for
these APIs, for example taskInfo()  would require the TaskId interface
which is a bit confusing. Or using getTaskId() temporarily until the
deprecated methods can be removed, which goes against the established
convention of no "get" in the getters. Another possibility is something
like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
taskId somewhere in the name.

Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the
interface after all, as in the example above to "TaskInfo". Or we could
take a similar approach and just build on the original TaskId name, calling
it TaskIdInfo or something like that. Any thoughts? As always the KIP has
been updated with what I am proposing, so you can take a look yourself.

On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman <sop...@confluent.io>
wrote:

> I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since
> it turned out to be more nuanced
> and grew in scope since I initially proposed it. Thanks John for
> the suggestions/questions on the [VOTE] thread,
> I have copied them over and addressed them below.
>
> Before that, I just want to apologize to everyone for bringing this to a
> vote immediately. I truly thought this would
> be a trivial KIP that just fixed what seemed like a bug in the API, but it
> slowly grew in scope as I wrangled with
> the question of whether TaskId was already a public API. To clarify, it is
> -- a TaskId is returned by the
> Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
> APIs at the time of writing this KIP.
> After finding those APIs and confirming that it was public, I began to
> wonder  whether it really should be.
>
> 1. "Topic Group" is a vestigial term from the early days of
>> the codebase. We call a "topic group" a "subtopology" in the
>> public interface now (although "topic group" is still used
>> internally some places). For user-facing consistency, we
>> should also use "subtopologyId" in your proposal.
>
>
> 100% agree, I remember being very confused by the term "topic group"  when
> I was first getting to know
> the Streams code base, and wondering why it wasn't just called
> "subtopology". This would be the perfect
> opportunity to migrate to the more clear terminology where appropriate.
> I've updated the KIP to reflect this.
>
> 2. I'm wondering if it's really necessary to introduce this
>> interface at all. I think your motivation is to be able to
>> get the subtopologyId and partition via TaskMetadata, right?
>> Why not just add those methods to TaskMetadata? Stepping
>> back, the concept of metadata about an identifier is a bit
>> elaborate.
>
>
> I think if the only issue we wanted to address was the TaskMetadata
> returning a string representation of a
> TaskId rather than the real thing, as was originally the case, then I
> would agree with just  unfolding the API
> to expose the TaskId fields directly. However while digging into the
> question of whether TaskId was public, it
> became clear that the class had some awkward features for a public API:
> public fields with no getters, many
> utility methods for internal functionality that had to be made public and
> cluttered up the API, etc. So I believe
> this KIP can and should take the opportunity to clean up the TaskId API
> altogether, not just for one getter in
> TaskMetadata. And we can't deprecate an API without adding a replacement
> for users to migrate onto, since
> the concept of a TaskId itself is still very much relevant. That's the
> primary motivation behind introducing this
> new interface. Of course technically this could be done by retaining the
> existing TaskId class and just moving
> the things that should not be public to the new internal class, but that's
> just a worse version of the same idea,
> and would require a somewhat awkward deprecation cycle. Plus, given that
> TaskId is intended to just be a
> wrapper for some misc information about a Task, a public interface feels
> more natural than a class.
>
> It also gives us more flexibility in the future, as it's much easier to
> just add a field to the TaskId (with a new
> getter on the interface) then to track  down all places where we return
> the TaskId fields, ie every topicGroupId()
> and partition() getters on an API, and add a new getter API for this field.
>
> Please refer to the KIP for the current proposal, including all deprecated
> and new APIs. I hope the latest
> iteration will make everyone happy. The only potentially controversial
> question remaining is how to name
> the new interface. Here are all the options that seem reasonable to me,
> though chose to go with the 2nd one:
>
> 1) Keep the name TaskId for the interface, put the interface in the
> o.a.k.streams package, and call the internal class
>     TaskIdImpl. This will require a fully qualified class in any file that
> references both the new interface and the
>     deprecated public TaskId class.
> 2) Same as the above, but allow the internal class to also retain the name
> TaskId. This will also require a fully qualified
>     class name whenever any of the three are referenced in the same file,
> though I don't think there is likely to be a single
>     file where the new internal TaskId class appears along with the old
> public TaskId class but not the new TaskId interface,
>     or vice versa, so the impact is probably exactly the same as the above
> option.
> 3) Name the interface something else entirely (eg TaskInfo) and keep the
> name TaskId for the new internal class. The
>     new internal TaskId and the deprecated public TaskId class most likely
> appear together. This option is probably the
>     least annoying with respect to requiring fully qualified class names,
> but I expect it's still on par with how often this
>     would be required in option 1 or 2.
>
> Given that both the new public interface and the deprecated public TaskId
> class are almost always going to appear together,
> we can't avoid fully qualified path names until the deprecated class can
> be removed, unless we give the interface a different
> name. And most of the time only the internal TaskId class will be
> referenced anyways, so the scope of fully qualified names
> should still be small with any option. In fact the vast majority of the
> codebase will be using the new internal class, so leaving it
> as "TaskId" will help prevent a truly massive PR, as only the import line
> will need to be changed, vs hundreds/thousands of lines
> in which the class name of the TaskId object appears. So I would rather
> avoid option 1.
>
> With option 2 we'll need to suppress a spotbugs warning, while with option
> 3 we are changing the name of a pretty fundamental
> concept throughout Streams, so users will need to mentally adapt to
> thinking about TaskInfo (or whatever) instead of in terms
> of TaskId. That is why I personally prefer option 2.
>
>
> Thanks for the discussion, sorry this turned into such a long response. I
> think we're on the same page now but I want to make
> sure the background and my general thought process behind all this was
> clear to all. I'll kick off the vote again but please respond
> here with any further concerns and save the [VOTE] thread for voting and
> maybe the occasional small question.
>
> -Sophie
>

Reply via email to