Thanks Matthias. 1) Given that TopologyDescription is for debugging purposes before `KafkaStreams.start()`. I think the simplest option b) may be sufficient. Just needs to emphasize its possible value semantics in Java docs.
3) You can tell that I was actually thinking about this together with KIP-130. To me if we can expose the runtime information, which is dynamic, via metrics in KIP-130 then we could remove this function. The main reason is that, again, the task migration make this function's behavior a bit difficult to explain. For example: streams.start(); sleep(/* some time */) streams.toString(); --------------------------- Even with the same configuration, depending on for how long did you wait after started, the function could return very different string results due to rebalances. That being said, I was not trying to make the decision in this KIP, as I saw it more related to KIP-130. So we could probably still keep it as is in KIP-120, and consider removing it in KIP-130. That's why I was just "asking your thoughts on this", but not necessary wanting to make an action in this KIP. Guozhang On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <[email protected]> wrote: > Thanks for your feedback Guozhang. > > > 1) There are multiple ways to do this. Let me know what you think about > all options: > > (a) I updated the KIP to this: > > > public final class Source implements Node { > > public final String name; > > // topicNames and topicPattern are mutually exclusive, i.e., > only one will be not-null > > public final List<String> topicNames; // null if #addSource(..., > Pattern) was used > > public final Pattern topicPattern; // null if #addSource(..., > String...) was used > > } > > (b) We could also go with a single variable (as originally proposed). > This would have the advantage (compared to (a)), that null checks are > not required accessing TopologyDescription#Source class. > > > String topics; // can be comma separated list of topic names or pattern > (as String) > > However, with an encoded list or an encoded pattern it's required to > parse the string again, what we want to avoid in the first place. > > (c) Use a single variable as in (b) > > > String topics; // always a pattern (as String) > > We translate a list of topic names into a pattern > "topic1|topic2|topic3". We loose the information if the source was added > via list or via pattern. > > > > 2) Your understanding is correct. Added a comment to the KIP. > > > > 3) I would keep KafkaStreams#toString() -- it's conceptually two > different things and runtime information is useful, too. But as its > return value is ambiguous to parse (and must be parsed in the first > place what is cumbersome), we could add KafkaStreams#describe() as > > > public synchronized KafkaStreamsDescription describe(); > > KafkaStreamsDescription class would be similar to TopologyDescription to > allow programmatic access to runtime information. I guess we could even > reuse (parts of) TopologyDescription within KafkaStreamsDescription to > avoid code duplication. > > If you think this would be useful, I can extend the KIP accordingly. > > > > -Matthias > > > > > On 3/10/17 1:38 PM, Guozhang Wang wrote: > > One more question here: > > > > 3. with TopologyDescription, do we still want to keep the > > `KafkaStream.toString()` function? I think it may still have some > advantage > > such that it contains tasks information after `KafkaStream#start()` has > > been called, but much of it is duplicate with the TopologyDescription, > and > > it is only in the form of the string hence hard to programmatically > > leverage. So would like to hear your thoughts. > > > > Guozhang > > > > On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <[email protected]> > wrote: > > > >> Thanks Matthias, the updated KIP lgtm overall. A couple of minor > comments: > >> > >> 1. With regard to this class: > >> > >> public final class Source implements Node { > >> public final String name; > >> public final String topic; // can be topic name or pattern (as > >> String) > >> } > >> > >> Note that the source node could contain more than a single topic, i.e. a > >> list of topics besides a pattern. > >> > >> 2. With regard to > >> > >> public synchronized TopologyDescription describe(); > >> > >> My understand is that whenever the topology is modified, one needs to > call > >> this function again to get a new description object, as the old one > won't > >> be updated automatically. Hence the usage pattern would be: > >> > >> TopologyDescription description = topology.describe(); > >> > >> topology.addProcessor(...) > >> > >> description = topology.describe(); // have to call again > >> > >> ----------- > >> > >> Is that right? If yes could you clarify this in the wiki? > >> > >> > >> > >> Guozhang > >> > >> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <[email protected]> > wrote: > >> > >>> Thanks for the update, Matthias. > >>> > >>> +1 to the points 1,2,3,4 you mentioned. > >>> > >>> Naming is always a tricky subject, but renaming KStreamBuilder > >>> to StreamsTopologyBuilder looks ok to me (I would have had a slight > >>> preference towards DslTopologyBuilder, but hey.) The most important > >>> aspect > >>> is, IMHO, what you also pointed out: to make it clear that the current > >>> KStreamBuilder actually builds a topology (though currently the latter > is > >>> actually called `TopologyBuilder` currently), and not a `KStream`. > >>> > >>> > >>> > >>> > >>> > >>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax < > [email protected]> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> sorry for not replying earlier and thanks for all your feedback. After > >>>> some more discussions I updated the KIP. The new proposal puts some > >>>> other design considerations into account, that I want to highlight > >>>> shortly. Those considerations, automatically resolve the concerns > >>> raised. > >>>> > >>>> First some answers: > >>>> > >>>>> The PAPI processors I use in my KStreams app are all functioning on > >>>> KTable > >>>>> internals. I wouldn't be able to convert them to > >>> process()/transform(). > >>>>> > >>>>> What's the harm in permitting both APIs to be used in the same > >>>> application? > >>>> > >>>> It's not about "harm" but about design. We want to switch from a > >>>> "inheritance" to a "composition" pattern. > >>>> > >>>> About the interface idea: using a shared interface would not help to > get > >>>> a composition pattern > >>>> > >>>> > >>>> Next I want to give the design considerations leading to the updated > >>> KIP: > >>>> > >>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is > unnatural. > >>>> KafkaStreams client executes a `Topology` and this execution should be > >>>> independent of the way the topology is "put together", ie, low-level > API > >>>> or DSL. > >>>> > >>>> 2) Thus, we don't want to have any changes to KafkaStreams class. > >>>> > >>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns > a > >>>> `Topology` that can be passed into KafakStreams. > >>>> > >>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to > >>>> rename the new class to `StreamsTopologyBuilder` (the name > >>>> TopologyBuilder would actually be more natural, but would be easily > >>>> confused with old low-level API TopologyBuilder). > >>>> > >>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as > >>>> StreamsTopologyBuilder return the created Topology via #build(). > >>>> > >>>> I also removed `final` for both builder classes. > >>>> > >>>> > >>>> > >>>> With regard to the larger scope of the overal API redesign, I also > want > >>>> to point to a summary of API issues: > >>>> https://cwiki.apache.org/confluence/display/KAFKA/ > >>>> Kafka+Streams+Discussions > >>>> > >>>> Thus, this KIP is only one building block of a larger improvement > >>>> effort, and we hope to get as much as possible done for 0.11. If you > >>>> have any API improvement ideas, please share them so we can come up > with > >>>> an holistic sound design (instead of uncoordinated local improvements > >>>> that might diverge) > >>>> > >>>> > >>>> > >>>> Looking forward to your feedback on this KIP and the other API issues. > >>>> > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> > >>>> > >>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote: > >>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax < > >>> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> - We also removed method #topologyBuilder() from KStreamBuilder > >>> because > >>>>>> we think #transform() should provide all functionality you need to > >>>>>> mix-an-match Processor API and DSL. If there is any further concern > >>>>>> about this, please let us know. > >>>>>> > >>>>> > >>>>> Hi Matthias, > >>>>> > >>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of > >>>> concerns > >>>>> about this. You're correct to point out that transform() can be used > >>> for > >>>>> some of the output situations I pointed out; albeit it seems somewhat > >>>>> awkward to do so in a "transform" method; what do you do with the > >>> retval? > >>>>> > >>>>> The PAPI processors I use in my KStreams app are all functioning on > >>>> KTable > >>>>> internals. I wouldn't be able to convert them to > >>> process()/transform(). > >>>>> > >>>>> What's the harm in permitting both APIs to be used in the same > >>>> application? > >>>>> > >>>>> Mathieu > >>>>> > >>>> > >>>> > >>> > >> > >> > >> > >> -- > >> -- Guozhang > >> > > > > > > > > -- -- Guozhang
