Guozhang, 1) I updated the KIP to option (b).
3) Agreed. So we leave this part out, and tackle it within KIP-130 -Matthias On 3/12/17 3:48 PM, Guozhang Wang wrote: > Thanks Matthias. > > 1) Given that TopologyDescription is for debugging purposes before > `KafkaStreams.start()`. I think the simplest option b) may be sufficient. > Just needs to emphasize its possible value semantics in Java docs. > > 3) You can tell that I was actually thinking about this together with > KIP-130. To me if we can expose the runtime information, which is dynamic, > via metrics in KIP-130 then we could remove this function. The main reason > is that, again, the task migration make this function's behavior a bit > difficult to explain. For example: > > streams.start(); > > sleep(/* some time */) > > streams.toString(); > > --------------------------- > > Even with the same configuration, depending on for how long did you wait > after started, the function could return very different string results due > to rebalances. > > That being said, I was not trying to make the decision in this KIP, as I > saw it more related to KIP-130. So we could probably still keep it as is in > KIP-120, and consider removing it in KIP-130. That's why I was just "asking > your thoughts on this", but not necessary wanting to make an action in this > KIP. > > > Guozhang > > > > On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> Thanks for your feedback Guozhang. >> >> >> 1) There are multiple ways to do this. Let me know what you think about >> all options: >> >> (a) I updated the KIP to this: >> >>> public final class Source implements Node { >>> public final String name; >>> // topicNames and topicPattern are mutually exclusive, i.e., >> only one will be not-null >>> public final List<String> topicNames; // null if #addSource(..., >> Pattern) was used >>> public final Pattern topicPattern; // null if #addSource(..., >> String...) was used >>> } >> >> (b) We could also go with a single variable (as originally proposed). >> This would have the advantage (compared to (a)), that null checks are >> not required accessing TopologyDescription#Source class. >> >>> String topics; // can be comma separated list of topic names or pattern >> (as String) >> >> However, with an encoded list or an encoded pattern it's required to >> parse the string again, what we want to avoid in the first place. >> >> (c) Use a single variable as in (b) >> >>> String topics; // always a pattern (as String) >> >> We translate a list of topic names into a pattern >> "topic1|topic2|topic3". We loose the information if the source was added >> via list or via pattern. >> >> >> >> 2) Your understanding is correct. Added a comment to the KIP. >> >> >> >> 3) I would keep KafkaStreams#toString() -- it's conceptually two >> different things and runtime information is useful, too. But as its >> return value is ambiguous to parse (and must be parsed in the first >> place what is cumbersome), we could add KafkaStreams#describe() as >> >>> public synchronized KafkaStreamsDescription describe(); >> >> KafkaStreamsDescription class would be similar to TopologyDescription to >> allow programmatic access to runtime information. I guess we could even >> reuse (parts of) TopologyDescription within KafkaStreamsDescription to >> avoid code duplication. >> >> If you think this would be useful, I can extend the KIP accordingly. >> >> >> >> -Matthias >> >> >> >> >> On 3/10/17 1:38 PM, Guozhang Wang wrote: >>> One more question here: >>> >>> 3. with TopologyDescription, do we still want to keep the >>> `KafkaStream.toString()` function? I think it may still have some >> advantage >>> such that it contains tasks information after `KafkaStream#start()` has >>> been called, but much of it is duplicate with the TopologyDescription, >> and >>> it is only in the form of the string hence hard to programmatically >>> leverage. So would like to hear your thoughts. >>> >>> Guozhang >>> >>> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wangg...@gmail.com> >> wrote: >>> >>>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor >> comments: >>>> >>>> 1. With regard to this class: >>>> >>>> public final class Source implements Node { >>>> public final String name; >>>> public final String topic; // can be topic name or pattern (as >>>> String) >>>> } >>>> >>>> Note that the source node could contain more than a single topic, i.e. a >>>> list of topics besides a pattern. >>>> >>>> 2. With regard to >>>> >>>> public synchronized TopologyDescription describe(); >>>> >>>> My understand is that whenever the topology is modified, one needs to >> call >>>> this function again to get a new description object, as the old one >> won't >>>> be updated automatically. Hence the usage pattern would be: >>>> >>>> TopologyDescription description = topology.describe(); >>>> >>>> topology.addProcessor(...) >>>> >>>> description = topology.describe(); // have to call again >>>> >>>> ----------- >>>> >>>> Is that right? If yes could you clarify this in the wiki? >>>> >>>> >>>> >>>> Guozhang >>>> >>>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mich...@confluent.io> >> wrote: >>>> >>>>> Thanks for the update, Matthias. >>>>> >>>>> +1 to the points 1,2,3,4 you mentioned. >>>>> >>>>> Naming is always a tricky subject, but renaming KStreamBuilder >>>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight >>>>> preference towards DslTopologyBuilder, but hey.) The most important >>>>> aspect >>>>> is, IMHO, what you also pointed out: to make it clear that the current >>>>> KStreamBuilder actually builds a topology (though currently the latter >> is >>>>> actually called `TopologyBuilder` currently), and not a `KStream`. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax < >> matth...@confluent.io> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> sorry for not replying earlier and thanks for all your feedback. After >>>>>> some more discussions I updated the KIP. The new proposal puts some >>>>>> other design considerations into account, that I want to highlight >>>>>> shortly. Those considerations, automatically resolve the concerns >>>>> raised. >>>>>> >>>>>> First some answers: >>>>>> >>>>>>> The PAPI processors I use in my KStreams app are all functioning on >>>>>> KTable >>>>>>> internals. I wouldn't be able to convert them to >>>>> process()/transform(). >>>>>>> >>>>>>> What's the harm in permitting both APIs to be used in the same >>>>>> application? >>>>>> >>>>>> It's not about "harm" but about design. We want to switch from a >>>>>> "inheritance" to a "composition" pattern. >>>>>> >>>>>> About the interface idea: using a shared interface would not help to >> get >>>>>> a composition pattern >>>>>> >>>>>> >>>>>> Next I want to give the design considerations leading to the updated >>>>> KIP: >>>>>> >>>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is >> unnatural. >>>>>> KafkaStreams client executes a `Topology` and this execution should be >>>>>> independent of the way the topology is "put together", ie, low-level >> API >>>>>> or DSL. >>>>>> >>>>>> 2) Thus, we don't want to have any changes to KafkaStreams class. >>>>>> >>>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns >> a >>>>>> `Topology` that can be passed into KafakStreams. >>>>>> >>>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to >>>>>> rename the new class to `StreamsTopologyBuilder` (the name >>>>>> TopologyBuilder would actually be more natural, but would be easily >>>>>> confused with old low-level API TopologyBuilder). >>>>>> >>>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as >>>>>> StreamsTopologyBuilder return the created Topology via #build(). >>>>>> >>>>>> I also removed `final` for both builder classes. >>>>>> >>>>>> >>>>>> >>>>>> With regard to the larger scope of the overal API redesign, I also >> want >>>>>> to point to a summary of API issues: >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>> Kafka+Streams+Discussions >>>>>> >>>>>> Thus, this KIP is only one building block of a larger improvement >>>>>> effort, and we hope to get as much as possible done for 0.11. If you >>>>>> have any API improvement ideas, please share them so we can come up >> with >>>>>> an holistic sound design (instead of uncoordinated local improvements >>>>>> that might diverge) >>>>>> >>>>>> >>>>>> >>>>>> Looking forward to your feedback on this KIP and the other API issues. >>>>>> >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote: >>>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax < >>>>> matth...@confluent.io> >>>>>>> wrote: >>>>>>> >>>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder >>>>> because >>>>>>>> we think #transform() should provide all functionality you need to >>>>>>>> mix-an-match Processor API and DSL. If there is any further concern >>>>>>>> about this, please let us know. >>>>>>>> >>>>>>> >>>>>>> Hi Matthias, >>>>>>> >>>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of >>>>>> concerns >>>>>>> about this. You're correct to point out that transform() can be used >>>>> for >>>>>>> some of the output situations I pointed out; albeit it seems somewhat >>>>>>> awkward to do so in a "transform" method; what do you do with the >>>>> retval? >>>>>>> >>>>>>> The PAPI processors I use in my KStreams app are all functioning on >>>>>> KTable >>>>>>> internals. I wouldn't be able to convert them to >>>>> process()/transform(). >>>>>>> >>>>>>> What's the harm in permitting both APIs to be used in the same >>>>>> application? >>>>>>> >>>>>>> Mathieu >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >>> >>> >> >> > >
signature.asc
Description: OpenPGP digital signature