@Guozhang: I recognized that you want to have `Topology` in the name. But it seems that more people preferred to not have it (Jay, Ram, Michael [?], myself).
@Michael: You seemed to agree with Jay about not exposing the `Topology` concept in our main entry class (ie, current KStreamBuilder), thus, I interpreted that you do not want `Topology` in the name either (I am a little surprised by your last response, that goes the opposite direction). > StreamsBuilder builder = new StreamsBuilder(); > > // And here you'd define your...well, what actually? > // Ah right, you are composing a topology here, though you are not > aware of it. Yes. You are not aware of if -- that's the whole point about it -- don't put the Topology concept in the focus... Furthermore, >>> So what are you building here with StreamsBuilder? Streams (hint: No)? >>> And what about tables -- is there a TableBuilder (hint: No)? I am not sure, if this is too much a concern. In contrast to `KStreamBuilder` (singular) that contains `KStream` and thus puts KStream concept in focus and thus degrade `KTable`, `StreamsBuilder` (plural) focuses on "Streams API". IMHO, it does not put focus on KStream. It's just a builder from the Streams API -- you don't need to worry what you are building -- and you don't need to think about the `Topology` concept (of course, you see that .build() return a Topology). Personally, I see pros and cons for both `StreamsBuilder` and `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and Ram can follow up and share their thoughts? I would also help a lot if other people put their vote for a name, too. -Matthias On 3/21/17 2:11 PM, Guozhang Wang wrote: > Just to clarify, I did want to have the term `Topology` as part of the > class name, for the reasons above. I'm not too worried about to be > consistent with the previous names, but I feel the `XXTopologyBuilder` is > better than `XXStreamsBuilder` since it's build() function returns a > Topology object. > > > Guozhang > > > On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io> wrote: > >> Hmm, I must admit I don't like this last update all too much. >> >> Basically we would have: >> >> StreamsBuilder builder = new StreamsBuilder(); >> >> // And here you'd define your...well, what actually? >> // Ah right, you are composing a topology here, though you are not >> aware of it. >> >> KafkaStreams streams = new KafkaStreams(builder.build(), >> streamsConfiguration); >> >> So what are you building here with StreamsBuilder? Streams (hint: No)? >> And what about tables -- is there a TableBuilder (hint: No)? >> >> I also interpret Guozhang's last response as that he'd prefer to have >> "Topology" in the class/interface names. I am aware that we shouldn't >> necessarily use the status quo to make decisions about future changes, but >> the very first concept we explain in the Kafka Streams documentation is >> "Stream Processing Topology": >> https://kafka.apache.org/0102/documentation/streams#streams_concepts >> >> -Michael >> >> >> >> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <matth...@confluent.io> >> wrote: >> >>> \cc users list >>> >>> >>> -------- Forwarded Message -------- >>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>> Date: Mon, 20 Mar 2017 11:51:01 -0700 >>> From: Matthias J. Sax <matth...@confluent.io> >>> Organization: Confluent Inc >>> To: dev@kafka.apache.org >>> >>> I want to push this discussion further. >>> >>> Guozhang's argument about "exposing" the Topology class is valid. It's a >>> public class anyway, so it's not as issue. However, I think the question >>> is not too much about exposing but about "advertising" (ie, putting it >>> into the focus) or not at DSL level. >>> >>> >>> If I interpret the last replies correctly, it seems that we could agree >>> on "StreamsBuilder" as name. I did update the KIP accordingly. Please >>> correct me, if I got this wrong. >>> >>> >>> If there are not other objects -- this naming discussion was the last >>> open point to far -- I would like the start the VOTE thread. >>> >>> >>> -Matthias >>> >>> >>> On 3/14/17 2:37 PM, Guozhang Wang wrote: >>>> I'd like to keep the term "Topology" inside the builder class since, as >>>> Matthias mentioned, this builder#build() function returns a "Topology" >>>> object, whose type is a public class anyways. Although you can argue to >>> let >>>> users always call >>>> >>>> "new KafkaStreams(builder.build())" >>>> >>>> I think it is still more benefit to expose this concept. >>>> >>>> >>>> >>>> Guozhang >>>> >>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < >> matth...@confluent.io >>>> >>>> wrote: >>>> >>>>> Thanks for your input Michael. >>>>> >>>>>>> - KafkaStreams as the new name for the builder that creates the >>> logical >>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>> `KafkaStreams.table("input-topic")`. >>>>> >>>>> I don't thinks this is a good idea, for multiple reasons: >>>>> >>>>> (1) We would reuse a name for a completely different purpose. The same >>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The >>>>> confusion would just be too large. >>>>> >>>>> So if we would start from scratch, it might be ok to do so, but now we >>>>> cannot make this move, IMHO. >>>>> >>>>> Also a clarification question: do you suggest to have static methods >>>>> #stream and #table -- I am not sure if this would work? >>>>> (or was you code snippet just simplification?) >>>>> >>>>> >>>>> (2) Kafka Streams is basically a "processing client" next to consumer >>>>> and producer client. Thus, the name KafkaStreams aligns to the naming >>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it would >> be >>>>> a good choice to "break" this naming scheme. >>>>> >>>>> Btw: this is also the reason, why we have KafkaStreams#close() -- and >>>>> not KafkaStreams#stop() -- because #close() aligns with consumer and >>>>> producer client. >>>>> >>>>> >>>>> (3) On more argument against using KafkaStreams as DSL entry class >> would >>>>> be, that it would need to create a Topology that can be given to the >>>>> "runner/processing-client". Thus the pattern would be >>>>> >>>>>> Topology topology = streams.build(); >>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) >>>>> >>>>> (or of course as a one liner). >>>>> >>>>> >>>>> >>>>> On the other hand, there was the idea (that we intentionally excluded >>>>> from the KIP), to change the "client instantiation" pattern. >>>>> >>>>> Right now, a new client in actively instantiated (ie, by calling >> "new") >>>>> and the topology if provided as a constructor argument. However, >>>>> especially for DSL (not sure if it would make sense for PAPI), the DSL >>>>> builder could create the client for the user. >>>>> >>>>> Something like this: >>>>> >>>>>> KStreamBuilder builder = new KStreamBuilder(); >>>>>> builder.whatever() // use the builder >>>>>> >>>>>> StreamsConfig config = .... >>>>>> KafkaStreams streams = builder.getKafkaStreams(config); >>>>> >>>>> If we change the patter like this, the notion a the "DSL builder" >> would >>>>> change, as it does not create a topology anymore, but it creates the >>>>> "processing client". This would address Jay's concern about "not >>>>> exposing concept users don't need the understand" and would not >> require >>>>> to include the word "Topology" in the DSL builder class name, because >>>>> the builder does not build a Topology anymore. >>>>> >>>>> I just put some names that came to my mind first hand -- did not think >>>>> about good names. It's just to discuss the pattern. >>>>> >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 3/14/17 3:36 AM, Michael Noll wrote: >>>>>> I see Jay's point, and I agree with much of it -- notably about being >>>>>> careful which concepts we do and do not expose, depending on which >> user >>>>>> group / user type is affected. That said, I'm not sure yet whether >> or >>>>> not >>>>>> we should get rid of "Topology" (or a similar term) in the DSL. >>>>>> >>>>>> For what it's worth, here's how related technologies define/name >> their >>>>>> "topologies" and "builders". Note that, in all cases, it's about >>>>>> constructing a logical processing plan, which then is being >>> executed/run. >>>>>> >>>>>> - `Pipeline` (Google Dataflow/Apache Beam) >>>>>> - To add a source you first instantiate the Source (e.g. >>>>>> `TextIO.Read.from("gs://some/inputData.txt")`), >>>>>> then attach it to your processing plan via >>>>> `Pipeline#apply(<source>)`. >>>>>> This setup is a bit different to our DSL because in our DSL the >>>>>> builder does both, i.e. >>>>>> instantiating + auto-attaching to itself. >>>>>> - To execute the processing plan you call `Pipeline#execute()`. >>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL. >>>>>> - To add a source you call e.g. >>>>>> `StreamingContext#socketTextStream("localhost", 9999)`. >>>>>> - To execute the processing plan you call >>>>> `StreamingContext#execute()`. >>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to our >>> DSL. >>>>>> - To add a source you call e.g. >>>>>> `StreamExecutionEnvironment#socketTextStream("localhost", 9999)`. >>>>>> - To execute the processing plan you call >>>>>> `StreamExecutionEnvironment#execute()`. >>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~ >>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) >>>>>> into Flows, which are [Runnable]Graphs. >>>>>> - You instantiate a Source directly, and then compose the Source >>> with >>>>>> Sinks to create a RunnableGraph: >>>>>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], >>> Mat2]): >>>>>> RunnableGraph[Mat]`. >>>>>> - To execute the processing plan you call `Flow#run()`. >>>>>> >>>>>> In our DSL, in comparison, we do: >>>>>> >>>>>> - `KStreamBuilder` (Kafka Streams API) >>>>>> - To add a source you call e.g. `KStreamBuilder#stream("input- >>>>> topic")`. >>>>>> - To execute the processing plan you create a `KafkaStreams` >>> instance >>>>>> from `KStreamBuilder` >>>>>> (where the builder will instantiate the topology = processing >>> plan >>>>> to >>>>>> be executed), and then >>>>>> call `KafkaStreams#start()`. Think of `KafkaStreams` as our >>>>> runner. >>>>>> >>>>>> First, I agree with the sentiment that the current name of >>>>> `KStreamBuilder` >>>>>> isn't great (which is why we're having this discussion). Also, that >>>>>> finding a good name is tricky. ;-) >>>>>> >>>>>> Second, even though I agree with many of Jay's points I'm not sure >>>>> whether >>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does not >>>>> include >>>>>> "topology" or a similar term) that much more. It still doesn't >>> describe >>>>>> what that class actually does, and what the difference to >>> `KafkaStreams` >>>>>> is. IMHO, the point of `KStreamBuilder` is that it lets you build a >>>>>> logical plan (what we call "topology"), and `KafkaStreams` is the >> thing >>>>>> that executes that plan. I'm not yet convinced that abstracting >> these >>>>> two >>>>>> points away from the user is a good idea if the argument is that it's >>>>>> potentially confusing to beginners (a claim which I am not sure is >>>>> actually >>>>>> true). >>>>>> >>>>>> That said, if we rather favor "good-sounding but perhaps less >>> technically >>>>>> correct names", I'd argue we should not even use something like >>>>> "Builder". >>>>>> We could, for example, also pick the following names: >>>>>> >>>>>> - KafkaStreams as the new name for the builder that creates the >> logical >>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>> `KafkaStreams.table("input-topic")`. >>>>>> - KafkaStreamsRunner as the new name for the executioner of the plan, >>>>> with >>>>>> `KafkaStreamsRunner(KafkaStreams).run()`. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < >> r...@confluent.io> >>>>>> wrote: >>>>>> >>>>>>> StreamsBuilder would be my vote. >>>>>>> >>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote: >>>>>>>> >>>>>>>> Hey Matthias, >>>>>>>> >>>>>>>> Make sense, I'm more advocating for removing the word topology than >>> any >>>>>>>> particular new replacement. >>>>>>>> >>>>>>>> -Jay >>>>>>>> >>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < >>>>> matth...@confluent.io >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Jay, >>>>>>>>> >>>>>>>>> thanks for your feedback >>>>>>>>> >>>>>>>>>> What if instead we called it KStreamsBuilder? >>>>>>>>> >>>>>>>>> That's the current name and I personally think it's not the best >>> one. >>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we have >>> the >>>>>>>>> concepts of KStreams and KTables, and the builder creates both. >>>>> However, >>>>>>>>> the name puts he focus on KStream and devalues KTable. >>>>>>>>> >>>>>>>>> I understand your argument, and I am personally open the remove >> the >>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what >> others >>>>>>>>> think about this. >>>>>>>>> >>>>>>>>> >>>>>>>>> About Processor API: I like the idea in general, but I thinks it's >>> out >>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing leaking >>>>>>>>> internal APIs and do some cleanup how our API reflects some >>> concepts. >>>>>>>>> >>>>>>>>> However, I added your idea to API discussion Wiki page and we take >>> if >>>>>>>>> from there: >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>>>>> Kafka+Streams+Discussions >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: >>>>>>>>>> Two things: >>>>>>>>>> >>>>>>>>>> 1. This is a minor thing but the proposed new name for >>>>> KStreamBuilder >>>>>>>>>> is StreamsTopologyBuilder. I actually think we should not put >>>>>>>>> topology in >>>>>>>>>> the name as topology is not a concept you need to understand at >>> the >>>>>>>>>> kstreams layer right now. I'd think of three categories of >>>>> concepts: >>>>>>>>> (1) >>>>>>>>>> concepts you need to understand to get going even for a simple >>>>>>>>> example, (2) >>>>>>>>>> concepts you need to understand to operate and debug a real >>>>>>>>> production app, >>>>>>>>>> (3) concepts we truly abstract and you don't need to ever >>>>> understand. >>>>>>>>> I >>>>>>>>>> think in the kstream layer topologies are currently category >> (2), >>>>> and >>>>>>>>> this >>>>>>>>>> is where they belong. By introducing the name in even the >>> simplest >>>>>>>>> example >>>>>>>>>> it means the user has to go read about toplogies to really >>>>> understand >>>>>>>>> even >>>>>>>>>> this simple snippet. What if instead we called it >>> KStreamsBuilder? >>>>>>>>>> 2. For the processor api, I think this api is mostly not for >> end >>>>>>>>> users. >>>>>>>>>> However this are a couple cases where it might make sense to >>> expose >>>>>>>>> it. I >>>>>>>>>> think users coming from Samza, or JMS's MessageListener ( >>>>>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ >>>>> MessageListener.html) >>>>>>>>>> understand a simple callback interface for message processing. >> In >>>>>>>>> fact, >>>>>>>>>> people often ask why Kafka's consumer doesn't provide such an >>>>>>>>> interface. >>>>>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that the >>>>>>>>> processor >>>>>>>>>> API documentation is a bit scary for a person implementing this >>>>> type >>>>>>>>> of >>>>>>>>>> api. My observation is that people using this style of API >> don't >>>>> do a >>>>>>>>> lot >>>>>>>>>> of cross-message operations, then just do single message >>> operations >>>>>>>>> and use >>>>>>>>>> a database for anything that spans messages. They also don't >>> factor >>>>>>>>> their >>>>>>>>>> code into many MessageListeners and compose them, they just >> have >>>>> one >>>>>>>>>> listener that has the complete handling logic. Say I am a user >>> who >>>>>>>>> wants to >>>>>>>>>> implement a single Processor in this style. Do we have an easy >>> way >>>>> to >>>>>>>>> do >>>>>>>>>> that today (either with the .transform/.process methods in >>> kstreams >>>>>>>>> or with >>>>>>>>>> the topology apis)? Is there anything we can do in the way of >>>>> trivial >>>>>>>>>> helper code to make this better? Also, how can we explain that >>>>>>>>> pattern to >>>>>>>>>> people? I think currently we have pretty in-depth docs on our >>> apis >>>>>>>>> but I >>>>>>>>>> suspect a person trying to figure out how to implement a simple >>>>>>>>> callback >>>>>>>>>> might get a bit lost trying to figure out how to wire it up. A >>>>> simple >>>>>>>>> five >>>>>>>>>> line example in the docs would probably help a lot. Not sure if >>>>> this >>>>>>>>> is >>>>>>>>>> best addressed in this KIP or is a side comment. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> -Jay >>>>>>>>>> >>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < >>>>> matth...@confluent.io >>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming >>>>> API. >>>>>>>>>>> >>>>>>>>>>> Please have a look here: >>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API >>>>>>>>>>> >>>>>>>>>>> Looking forward to your feedback! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >> > > >
signature.asc
Description: OpenPGP digital signature