Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Guozhang Wang Sun, 12 Mar 2017 15:49:35 -0700

Thanks Matthias.

1) Given that TopologyDescription is for debugging purposes before
`KafkaStreams.start()`. I think the simplest option b) may be sufficient.
Just needs to emphasize its possible value semantics in Java docs.


3) You can tell that I was actually thinking about this together with
KIP-130. To me if we can expose the runtime information, which is dynamic,
via metrics in KIP-130 then we could remove this function. The main reason
is that, again, the task migration make this function's behavior a bit
difficult to explain. For example:

streams.start();

sleep(/* some time */)

streams.toString();

---------------------------

Even with the same configuration, depending on for how long did you wait
after started, the function could return very different string results due
to rebalances.

That being said, I was not trying to make the decision in this KIP, as I
saw it more related to KIP-130. So we could probably still keep it as is in
KIP-120, and consider removing it in KIP-130. That's why I was just "asking
your thoughts on this", but not necessary wanting to make an action in this
KIP.


Guozhang



On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <[email protected]>
wrote:

> Thanks for your feedback Guozhang.
>
>
> 1) There are multiple ways to do this. Let me know what you think about
> all options:
>
> (a) I updated the KIP to this:
>
> >     public final class Source implements Node {
> >         public final String name;
> >         // topicNames and topicPattern are mutually exclusive, i.e.,
> only one will be not-null
> >         public final List<String> topicNames; // null if #addSource(...,
> Pattern) was used
> >         public final Pattern topicPattern; // null if #addSource(...,
> String...) was used
> >     }
>
> (b) We could also go with a single variable (as originally proposed).
> This would have the advantage (compared to (a)), that null checks are
> not required accessing TopologyDescription#Source class.
>
> > String topics; // can be comma separated list of topic names or pattern
> (as String)
>
> However, with an encoded list or an encoded pattern it's required to
> parse the string again, what we want to avoid in the first place.
>
> (c) Use a single variable as in (b)
>
> > String topics; // always a pattern (as String)
>
> We translate a list of topic names into a pattern
> "topic1|topic2|topic3". We loose the information if the source was added
> via list or via pattern.
>
>
>
> 2) Your understanding is correct. Added a comment to the KIP.
>
>
>
> 3) I would keep KafkaStreams#toString() -- it's conceptually two
> different things and runtime information is useful, too. But as its
> return value is ambiguous to parse (and must be parsed in the first
> place what is cumbersome), we could add KafkaStreams#describe() as
>
> > public synchronized KafkaStreamsDescription describe();
>
> KafkaStreamsDescription class would be similar to TopologyDescription to
> allow programmatic access to runtime information. I guess we could even
> reuse (parts of) TopologyDescription within KafkaStreamsDescription to
> avoid code duplication.
>
> If you think this would be useful, I can extend the KIP accordingly.
>
>
>
> -Matthias
>
>
>
>
> On 3/10/17 1:38 PM, Guozhang Wang wrote:
> > One more question here:
> >
> > 3. with TopologyDescription, do we still want to keep the
> > `KafkaStream.toString()` function? I think it may still have some
> advantage
> > such that it contains tasks information after `KafkaStream#start()` has
> > been called, but much of it is duplicate with the TopologyDescription,
> and
> > it is only in the form of the string hence hard to programmatically
> > leverage. So would like to hear your thoughts.
> >
> > Guozhang
> >
> > On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <[email protected]>
> wrote:
> >
> >> Thanks Matthias, the updated KIP lgtm overall. A couple of minor
> comments:
> >>
> >> 1. With regard to this class:
> >>
> >>     public final class Source implements Node {
> >>         public final String name;
> >>         public final String topic; // can be topic name or pattern (as
> >> String)
> >>     }
> >>
> >> Note that the source node could contain more than a single topic, i.e. a
> >> list of topics besides a pattern.
> >>
> >> 2. With regard to
> >>
> >>           public synchronized TopologyDescription describe();
> >>
> >> My understand is that whenever the topology is modified, one needs to
> call
> >> this function again to get a new description object, as the old one
> won't
> >> be updated automatically. Hence the usage pattern would be:
> >>
> >> TopologyDescription description = topology.describe();
> >>
> >> topology.addProcessor(...)
> >>
> >> description = topology.describe(); // have to call again
> >>
> >> -----------
> >>
> >> Is that right? If yes could you clarify this in the wiki?
> >>
> >>
> >>
> >> Guozhang
> >>
> >> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <[email protected]>
> wrote:
> >>
> >>> Thanks for the update, Matthias.
> >>>
> >>> +1 to the points 1,2,3,4 you mentioned.
> >>>
> >>> Naming is always a tricky subject, but renaming KStreamBuilder
> >>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
> >>> preference towards DslTopologyBuilder, but hey.)  The most important
> >>> aspect
> >>> is, IMHO, what you also pointed out:  to make it clear that the current
> >>> KStreamBuilder actually builds a topology (though currently the latter
> is
> >>> actually called `TopologyBuilder` currently), and not a `KStream`.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <
> [email protected]>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> sorry for not replying earlier and thanks for all your feedback. After
> >>>> some more discussions I updated the KIP. The new proposal puts some
> >>>> other design considerations into account, that I want to highlight
> >>>> shortly. Those considerations, automatically resolve the concerns
> >>> raised.
> >>>>
> >>>> First some answers:
> >>>>
> >>>>> The PAPI processors I use in my KStreams app are all functioning on
> >>>> KTable
> >>>>> internals.  I wouldn't be able to convert them to
> >>> process()/transform().
> >>>>>
> >>>>> What's the harm in permitting both APIs to be used in the same
> >>>> application?
> >>>>
> >>>> It's not about "harm" but about design. We want to switch from a
> >>>> "inheritance" to a "composition" pattern.
> >>>>
> >>>> About the interface idea: using a shared interface would not help to
> get
> >>>> a composition pattern
> >>>>
> >>>>
> >>>> Next I want to give the design considerations leading to the updated
> >>> KIP:
> >>>>
> >>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is
> unnatural.
> >>>> KafkaStreams client executes a `Topology` and this execution should be
> >>>> independent of the way the topology is "put together", ie, low-level
> API
> >>>> or DSL.
> >>>>
> >>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
> >>>>
> >>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns
> a
> >>>> `Topology` that can be passed into KafakStreams.
> >>>>
> >>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> >>>> rename the new class to `StreamsTopologyBuilder` (the name
> >>>> TopologyBuilder would actually be more natural, but would be easily
> >>>> confused with old low-level API TopologyBuilder).
> >>>>
> >>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
> >>>> StreamsTopologyBuilder return the created Topology via #build().
> >>>>
> >>>> I also removed `final` for both builder classes.
> >>>>
> >>>>
> >>>>
> >>>> With regard to the larger scope of the overal API redesign, I also
> want
> >>>> to point to a summary of API issues:
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>>> Kafka+Streams+Discussions
> >>>>
> >>>> Thus, this KIP is only one building block of a larger improvement
> >>>> effort, and we hope to get as much as possible done for 0.11. If you
> >>>> have any API improvement ideas, please share them so we can come up
> with
> >>>> an holistic sound design (instead of uncoordinated local improvements
> >>>> that might diverge)
> >>>>
> >>>>
> >>>>
> >>>> Looking forward to your feedback on this KIP and the other API issues.
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> >>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
> >>> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
> >>> because
> >>>>>> we think #transform() should provide all functionality you need to
> >>>>>> mix-an-match Processor API and DSL. If there is any further concern
> >>>>>> about this, please let us know.
> >>>>>>
> >>>>>
> >>>>> Hi Matthias,
> >>>>>
> >>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> >>>> concerns
> >>>>> about this.  You're correct to point out that transform() can be used
> >>> for
> >>>>> some of the output situations I pointed out; albeit it seems somewhat
> >>>>> awkward to do so in a "transform" method; what do you do with the
> >>> retval?
> >>>>>
> >>>>> The PAPI processors I use in my KStreams app are all functioning on
> >>>> KTable
> >>>>> internals.  I wouldn't be able to convert them to
> >>> process()/transform().
> >>>>>
> >>>>> What's the harm in permitting both APIs to be used in the same
> >>>> application?
> >>>>>
> >>>>> Mathieu
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to