Just a small correction to KIP-120: Changed it to use `Set` instead of `List` within `TopologyDescription` and highlighting that we use getters (added couple of `()`). Also moved `name` to interface `Node` as all nodes do have a name.
Also note, that I updated `GlobalStore` -- it's more detailed now (it was incomplete before): the original information is contained in the nested nodes. https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=67641273&selectedPageVersions=20&selectedPageVersions=17 -Matthias On 3/28/17 7:01 PM, Matthias J. Sax wrote: > With regard to KIP-130: > > Form KIP-130 thread: > >> About subtopologies and tasks. We do have the concept of subtopologies >> already in KIP-120. It's only missing and ID that allow to link a >> subtopology to a task. >> >> IMHO, adding a simple variable to `Subtopoloy` that provide the id should be >> sufficient. We can simply document in the JavaDocs how Subtopology and >> TaskMetadata can be linked to each other. > > I updated KIP-120 to include one for field for this. > > > -Matthias > > > On 3/27/17 4:27 PM, Matthias J. Sax wrote: >> Hi, >> >> I would like to trigger this discussion again. It seems that the naming >> question is rather subjective and both main alternatives (w/ or w/o the >> word "Topology" in the name) have pros/cons. >> >> If you have any further thought, please share it. At the moment I still >> propose `StreamsBuilder` in the KIP. >> >> I also want do point out, that the VOTE thread was already started. So >> if you like the current KIP, please cast your vote there. >> >> >> Thanks a lot! >> >> >> -Matthias >> >> >> On 3/23/17 3:38 PM, Matthias J. Sax wrote: >>> Jay, >>> >>> about the naming schema: >>> >>>>> 1. "kstreams" - the DSL >>>>> 2. "processor api" - the lower level callback/topology api >>>>> 3. KStream/KTable - entities in the kstreams dsl >>>>> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>>>> including both kstreams and the processor API plus the underlying >>>>> implementation. >>> >>> It think this terminology has some issues... To me, `kstreams` was >>> always not more than an abbreviation for `Kafka Streams` -- thus (1) and >>> (4) kinda collide here. Following questions on the mailing list etc I >>> often see people using kstreams or kstream exactly a abbr. for "Kafka >>> Streams" >>> >>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not >>>> particularly confusing. >>> >>> I disagree here. It's a very subtle difference between `kstreams` and >>> `KStream` -- just singular/plural, thus (1) and (3) also "collide" -- >>> it's just too close to each other. >>> >>> Thus, I really think it's a good idea to get a new name for the DSL to >>> get a better separation of the 4 concepts. >>> >>> Furthermore, we use the term "Streams API". Thus, I think >>> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names. >>> >>> >>> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`). >>> >>> I will start a VOTE thread. Of course, we can still discuss the naming >>> issue. :) >>> >>> >>> >>> -Matthias >>> >>> >>> On 3/22/17 8:53 PM, Jay Kreps wrote: >>>> I don't feel strongly on this, so I'm happy with whatever everyone else >>>> wants. >>>> >>>> Michael, I'm not arguing that people don't need to understand topologies, I >>>> just think it is like rocks db, you need to understand it when >>>> debugging/operating but not in the initial coding since the metaphor we're >>>> providing at this layer isn't a topology of processors but rather something >>>> like the collections api. Anyhow it won't hurt people to have it there. >>>> >>>> For the original KStreamBuilder thing, I think that came from the naming we >>>> discussed originally: >>>> >>>> 1. "kstreams" - the DSL >>>> 2. "processor api" - the lower level callback/topology api >>>> 3. KStream/KTable - entities in the kstreams dsl >>>> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>>> including both kstreams and the processor API plus the underlying >>>> implementation. >>>> >>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not >>>> particularly confusing. Just like referring to the "java collections >>>> library" isn't confusing even though it contains the Iterator interface >>>> which is not actually itself a collection. >>>> >>>> So I think KStreamBuilder should technically have been KstreamsBuilder and >>>> is intended not to be a builder of a KStream but rather the builder for the >>>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-) >>>> >>>> -Jay >>>> >>>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote: >>>> >>>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` >>>>> that >>>>> are going to be used in DSL, I agree both has their arguments: >>>>> >>>>> 1. On one side, people using the DSL layer probably do not need to be >>>>> aware >>>>> (or rather, "learn about") of the "topology" concept, although this >>>>> concept >>>>> is a publicly exposed one in Kafka Streams. >>>>> >>>>> 2. On the other side, StreamsBuilder#build() returning a Topology object >>>>> sounds a little weird, at least to me (admittedly subjective matter). >>>>> >>>>> >>>>> Since the second bullet point seems to be more "subjective" and many >>>>> people >>>>> are not worried about it, I'm OK to go with the other option. >>>>> >>>>> >>>>> Guozhang >>>>> >>>>> >>>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io> >>>>> wrote: >>>>> >>>>>> Forwarding to kafka-user. >>>>>> >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Michael Noll <mich...@confluent.io> >>>>>> Date: Wed, Mar 22, 2017 at 8:48 AM >>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>>> To: dev@kafka.apache.org >>>>>> >>>>>> >>>>>> Matthias, >>>>>> >>>>>>> @Michael: >>>>>>> >>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>>>> little surprised by your last response, that goes the opposite >>>>>> direction). >>>>>> >>>>>> Oh, sorry for not being clear. >>>>>> >>>>>> What I wanted to say in my earlier email was the following: Yes, I do >>>>>> agree with most of Jay's reasoning, notably about carefully deciding how >>>>>> much and which parts of the API/concept "surface" we expose to users of >>>>> the >>>>>> DSL. However, and this is perhaps where I wasn't very clear, I disagree >>>>> on >>>>>> the particular opinion about not exposing the topology concept to DSL >>>>>> users. Instead, I think the concept of a topology is important to >>>>>> understand even for DSL users -- particularly because of the way the DSL >>>>> is >>>>>> currently wiring your processing logic via the builder pattern. (As I >>>>>> noted, e.g. Akka uses a different approach where you might be able to get >>>>>> away with not exposing the "topology" concept, but even in Akka there's >>>>> the >>>>>> notion of graphs and flows.) >>>>>> >>>>>> >>>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>>> >>>>>>>> // And here you'd define your...well, what actually? >>>>>>>> // Ah right, you are composing a topology here, though you are >>>>> not >>>>>>>> aware of it. >>>>>>> >>>>>>> Yes. You are not aware of if -- that's the whole point about it -- >>>>> don't >>>>>>> put the Topology concept in the focus... >>>>>> >>>>>> Let me turn this around, because that was my point: it's confusing to >>>>> have >>>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is >>>>>> not. >>>>>> >>>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL >>>>>> users that there are two aspects at play: (1) defining the logic/plan of >>>>>> your processing, and (2) the execution of that plan. I have a less >>>>> strong >>>>>> opinion whether or not having "topology" in the names would help to >>>>>> communicate this separation as well as combination of (1) and (2) to make >>>>>> your app work as expected. >>>>>> >>>>>> If we stick with `KafkaStreams` for (2) *and* don't like having >>>>> "topology" >>>>>> in the name, then perhaps we should rename `KStreamBuilder` to >>>>>> `KafkaStreamsBuilder`. That at least gives some illusion of a combo of >>>>> (1) >>>>>> and (2). IMHO, `KafkaStreamsBuilder` highlights better that "it is a >>>>>> builder/helper for the Kafka Streams API", rather than "a builder for >>>>>> streams". >>>>>> >>>>>> Also, I think some of the naming challenges we're discussing here are >>>>>> caused by having this builder pattern in the first place. If the Streams >>>>>> API was implemented in Scala, for example, we could use implicits for >>>>>> helping us to "stitch streams/tables together to build the full >>>>> topology", >>>>>> thus using a different (better?) approach to composing your topologies >>>>> that >>>>>> through a builder pattern. So: perhaps there's a better way then the >>>>>> builder, and that way would also be clearer on terminology? That said, >>>>>> this might take this KIP off-scope. >>>>>> >>>>>> -Michael >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io >>>>>> >>>>>> wrote: >>>>>> >>>>>>> @Guozhang: >>>>>>> >>>>>>> I recognized that you want to have `Topology` in the name. But it seems >>>>>>> that more people preferred to not have it (Jay, Ram, Michael [?], >>>>>> myself). >>>>>>> >>>>>>> @Michael: >>>>>>> >>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>>>> little surprised by your last response, that goes the opposite >>>>>> direction). >>>>>>> >>>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>>> >>>>>>>> // And here you'd define your...well, what actually? >>>>>>>> // Ah right, you are composing a topology here, though you are >>>>> not >>>>>>>> aware of it. >>>>>>> >>>>>>> Yes. You are not aware of if -- that's the whole point about it -- >>>>> don't >>>>>>> put the Topology concept in the focus... >>>>>>> >>>>>>> Furthermore, >>>>>>> >>>>>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>>>> No)? >>>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>>> >>>>>>> I am not sure, if this is too much a concern. In contrast to >>>>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts >>>>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder` >>>>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on >>>>>>> KStream. It's just a builder from the Streams API -- you don't need to >>>>>>> worry what you are building -- and you don't need to think about the >>>>>>> `Topology` concept (of course, you see that .build() return a >>>>> Topology). >>>>>>> >>>>>>> >>>>>>> Personally, I see pros and cons for both `StreamsBuilder` and >>>>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and >>>>>>> Ram can follow up and share their thoughts? >>>>>>> >>>>>>> I would also help a lot if other people put their vote for a name, too. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote: >>>>>>>> Just to clarify, I did want to have the term `Topology` as part of >>>>> the >>>>>>>> class name, for the reasons above. I'm not too worried about to be >>>>>>>> consistent with the previous names, but I feel the >>>>> `XXTopologyBuilder` >>>>>> is >>>>>>>> better than `XXStreamsBuilder` since it's build() function returns a >>>>>>>> Topology object. >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io >>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hmm, I must admit I don't like this last update all too much. >>>>>>>>> >>>>>>>>> Basically we would have: >>>>>>>>> >>>>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>>>> >>>>>>>>> // And here you'd define your...well, what actually? >>>>>>>>> // Ah right, you are composing a topology here, though you are >>>>> not >>>>>>>>> aware of it. >>>>>>>>> >>>>>>>>> KafkaStreams streams = new KafkaStreams(builder.build(), >>>>>>>>> streamsConfiguration); >>>>>>>>> >>>>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>>>> No)? >>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>>>>> >>>>>>>>> I also interpret Guozhang's last response as that he'd prefer to >>>>> have >>>>>>>>> "Topology" in the class/interface names. I am aware that we >>>>> shouldn't >>>>>>>>> necessarily use the status quo to make decisions about future >>>>> changes, >>>>>>> but >>>>>>>>> the very first concept we explain in the Kafka Streams documentation >>>>>> is >>>>>>>>> "Stream Processing Topology": >>>>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_ >>>>> concepts >>>>>>>>> >>>>>>>>> -Michael >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax < >>>>>> matth...@confluent.io >>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> \cc users list >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -------- Forwarded Message -------- >>>>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700 >>>>>>>>>> From: Matthias J. Sax <matth...@confluent.io> >>>>>>>>>> Organization: Confluent Inc >>>>>>>>>> To: dev@kafka.apache.org >>>>>>>>>> >>>>>>>>>> I want to push this discussion further. >>>>>>>>>> >>>>>>>>>> Guozhang's argument about "exposing" the Topology class is valid. >>>>>> It's >>>>>>> a >>>>>>>>>> public class anyway, so it's not as issue. However, I think the >>>>>>> question >>>>>>>>>> is not too much about exposing but about "advertising" (ie, putting >>>>>> it >>>>>>>>>> into the focus) or not at DSL level. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If I interpret the last replies correctly, it seems that we could >>>>>> agree >>>>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly. >>>>> Please >>>>>>>>>> correct me, if I got this wrong. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If there are not other objects -- this naming discussion was the >>>>> last >>>>>>>>>> open point to far -- I would like the start the VOTE thread. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote: >>>>>>>>>>> I'd like to keep the term "Topology" inside the builder class >>>>> since, >>>>>>> as >>>>>>>>>>> Matthias mentioned, this builder#build() function returns a >>>>>> "Topology" >>>>>>>>>>> object, whose type is a public class anyways. Although you can >>>>> argue >>>>>>> to >>>>>>>>>> let >>>>>>>>>>> users always call >>>>>>>>>>> >>>>>>>>>>> "new KafkaStreams(builder.build())" >>>>>>>>>>> >>>>>>>>>>> I think it is still more benefit to expose this concept. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Guozhang >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < >>>>>>>>> matth...@confluent.io >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for your input Michael. >>>>>>>>>>>> >>>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>>>>> logical >>>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>>>> >>>>>>>>>>>> I don't thinks this is a good idea, for multiple reasons: >>>>>>>>>>>> >>>>>>>>>>>> (1) We would reuse a name for a completely different purpose. The >>>>>>> same >>>>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The >>>>>>>>>>>> confusion would just be too large. >>>>>>>>>>>> >>>>>>>>>>>> So if we would start from scratch, it might be ok to do so, but >>>>> now >>>>>>> we >>>>>>>>>>>> cannot make this move, IMHO. >>>>>>>>>>>> >>>>>>>>>>>> Also a clarification question: do you suggest to have static >>>>>> methods >>>>>>>>>>>> #stream and #table -- I am not sure if this would work? >>>>>>>>>>>> (or was you code snippet just simplification?) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to >>>>>> consumer >>>>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the >>>>>> naming >>>>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it >>>>>> would >>>>>>>>> be >>>>>>>>>>>> a good choice to "break" this naming scheme. >>>>>>>>>>>> >>>>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() -- >>>>>> and >>>>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer >>>>>> and >>>>>>>>>>>> producer client. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry >>>>> class >>>>>>>>> would >>>>>>>>>>>> be, that it would need to create a Topology that can be given to >>>>>> the >>>>>>>>>>>> "runner/processing-client". Thus the pattern would be >>>>>>>>>>>> >>>>>>>>>>>>> Topology topology = streams.build(); >>>>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) >>>>>>>>>>>> >>>>>>>>>>>> (or of course as a one liner). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On the other hand, there was the idea (that we intentionally >>>>>> excluded >>>>>>>>>>>> from the KIP), to change the "client instantiation" pattern. >>>>>>>>>>>> >>>>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling >>>>>>>>> "new") >>>>>>>>>>>> and the topology if provided as a constructor argument. However, >>>>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI), >>>>> the >>>>>>> DSL >>>>>>>>>>>> builder could create the client for the user. >>>>>>>>>>>> >>>>>>>>>>>> Something like this: >>>>>>>>>>>> >>>>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder(); >>>>>>>>>>>>> builder.whatever() // use the builder >>>>>>>>>>>>> >>>>>>>>>>>>> StreamsConfig config = .... >>>>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config); >>>>>>>>>>>> >>>>>>>>>>>> If we change the patter like this, the notion a the "DSL builder" >>>>>>>>> would >>>>>>>>>>>> change, as it does not create a topology anymore, but it creates >>>>>> the >>>>>>>>>>>> "processing client". This would address Jay's concern about "not >>>>>>>>>>>> exposing concept users don't need the understand" and would not >>>>>>>>> require >>>>>>>>>>>> to include the word "Topology" in the DSL builder class name, >>>>>> because >>>>>>>>>>>> the builder does not build a Topology anymore. >>>>>>>>>>>> >>>>>>>>>>>> I just put some names that came to my mind first hand -- did not >>>>>>> think >>>>>>>>>>>> about good names. It's just to discuss the pattern. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -Matthias >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote: >>>>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about >>>>>>> being >>>>>>>>>>>>> careful which concepts we do and do not expose, depending on >>>>> which >>>>>>>>> user >>>>>>>>>>>>> group / user type is affected. That said, I'm not sure yet >>>>>> whether >>>>>>>>> or >>>>>>>>>>>> not >>>>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL. >>>>>>>>>>>>> >>>>>>>>>>>>> For what it's worth, here's how related technologies define/name >>>>>>>>> their >>>>>>>>>>>>> "topologies" and "builders". Note that, in all cases, it's >>>>> about >>>>>>>>>>>>> constructing a logical processing plan, which then is being >>>>>>>>>> executed/run. >>>>>>>>>>>>> >>>>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam) >>>>>>>>>>>>> - To add a source you first instantiate the Source (e.g. >>>>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`), >>>>>>>>>>>>> then attach it to your processing plan via >>>>>>>>>>>> `Pipeline#apply(<source>)`. >>>>>>>>>>>>> This setup is a bit different to our DSL because in our >>>>> DSL >>>>>>> the >>>>>>>>>>>>> builder does both, i.e. >>>>>>>>>>>>> instantiating + auto-attaching to itself. >>>>>>>>>>>>> - To execute the processing plan you call >>>>>> `Pipeline#execute()`. >>>>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL. >>>>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`. >>>>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>>>> `StreamingContext#execute()`. >>>>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to >>>>>> our >>>>>>>>>> DSL. >>>>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost", >>>>> 9999)`. >>>>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>>>>> `StreamExecutionEnvironment#execute()`. >>>>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing >>>>> Sources >>>>>> (~ >>>>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) >>>>>>>>>>>>> into Flows, which are [Runnable]Graphs. >>>>>>>>>>>>> - You instantiate a Source directly, and then compose the >>>>>> Source >>>>>>>>>> with >>>>>>>>>>>>> Sinks to create a RunnableGraph: >>>>>>>>>>>>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], >>>>>>>>>> Mat2]): >>>>>>>>>>>>> RunnableGraph[Mat]`. >>>>>>>>>>>>> - To execute the processing plan you call `Flow#run()`. >>>>>>>>>>>>> >>>>>>>>>>>>> In our DSL, in comparison, we do: >>>>>>>>>>>>> >>>>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API) >>>>>>>>>>>>> - To add a source you call e.g. >>>>> `KStreamBuilder#stream("input- >>>>>>>>>>>> topic")`. >>>>>>>>>>>>> - To execute the processing plan you create a `KafkaStreams` >>>>>>>>>> instance >>>>>>>>>>>>> from `KStreamBuilder` >>>>>>>>>>>>> (where the builder will instantiate the topology = >>>>>> processing >>>>>>>>>> plan >>>>>>>>>>>> to >>>>>>>>>>>>> be executed), and then >>>>>>>>>>>>> call `KafkaStreams#start()`. Think of `KafkaStreams` as >>>>> our >>>>>>>>>>>> runner. >>>>>>>>>>>>> >>>>>>>>>>>>> First, I agree with the sentiment that the current name of >>>>>>>>>>>> `KStreamBuilder` >>>>>>>>>>>>> isn't great (which is why we're having this discussion). Also, >>>>>> that >>>>>>>>>>>>> finding a good name is tricky. ;-) >>>>>>>>>>>>> >>>>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not >>>>> sure >>>>>>>>>>>> whether >>>>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does >>>>>> not >>>>>>>>>>>> include >>>>>>>>>>>>> "topology" or a similar term) that much more. It still doesn't >>>>>>>>>> describe >>>>>>>>>>>>> what that class actually does, and what the difference to >>>>>>>>>> `KafkaStreams` >>>>>>>>>>>>> is. IMHO, the point of `KStreamBuilder` is that it lets you >>>>>> build a >>>>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is >>>>> the >>>>>>>>> thing >>>>>>>>>>>>> that executes that plan. I'm not yet convinced that abstracting >>>>>>>>> these >>>>>>>>>>>> two >>>>>>>>>>>>> points away from the user is a good idea if the argument is that >>>>>>> it's >>>>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure >>>>> is >>>>>>>>>>>> actually >>>>>>>>>>>>> true). >>>>>>>>>>>>> >>>>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less >>>>>>>>>> technically >>>>>>>>>>>>> correct names", I'd argue we should not even use something like >>>>>>>>>>>> "Builder". >>>>>>>>>>>>> We could, for example, also pick the following names: >>>>>>>>>>>>> >>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>>>> logical >>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the >>>>>>> plan, >>>>>>>>>>>> with >>>>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < >>>>>>>>> r...@confluent.io> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> StreamsBuilder would be my vote. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> >>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Matthias, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology >>>>>>> than >>>>>>>>>> any >>>>>>>>>>>>>>> particular new replacement. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < >>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jay, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks for your feedback >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That's the current name and I personally think it's not the >>>>>> best >>>>>>>>>> one. >>>>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we >>>>>> have >>>>>>>>>> the >>>>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates >>>>> both. >>>>>>>>>>>> However, >>>>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I understand your argument, and I am personally open the >>>>> remove >>>>>>>>> the >>>>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what >>>>>>>>> others >>>>>>>>>>>>>>>> think about this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks >>>>>>> it's >>>>>>>>>> out >>>>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing >>>>>> leaking >>>>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some >>>>>>>>>> concepts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we >>>>>>> take >>>>>>>>>> if >>>>>>>>>>>>>>>> from there: >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>>>>>>>>>>>> Kafka+Streams+Discussions >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: >>>>>>>>>>>>>>>>> Two things: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. This is a minor thing but the proposed new name for >>>>>>>>>>>> KStreamBuilder >>>>>>>>>>>>>>>>> is StreamsTopologyBuilder. I actually think we should not >>>>>> put >>>>>>>>>>>>>>>> topology in >>>>>>>>>>>>>>>>> the name as topology is not a concept you need to >>>>> understand >>>>>>> at >>>>>>>>>> the >>>>>>>>>>>>>>>>> kstreams layer right now. I'd think of three categories of >>>>>>>>>>>> concepts: >>>>>>>>>>>>>>>> (1) >>>>>>>>>>>>>>>>> concepts you need to understand to get going even for a >>>>>> simple >>>>>>>>>>>>>>>> example, (2) >>>>>>>>>>>>>>>>> concepts you need to understand to operate and debug a >>>>> real >>>>>>>>>>>>>>>> production app, >>>>>>>>>>>>>>>>> (3) concepts we truly abstract and you don't need to ever >>>>>>>>>>>> understand. >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> think in the kstream layer topologies are currently >>>>> category >>>>>>>>> (2), >>>>>>>>>>>> and >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> is where they belong. By introducing the name in even the >>>>>>>>>> simplest >>>>>>>>>>>>>>>> example >>>>>>>>>>>>>>>>> it means the user has to go read about toplogies to really >>>>>>>>>>>> understand >>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>>> this simple snippet. What if instead we called it >>>>>>>>>> KStreamsBuilder? >>>>>>>>>>>>>>>>> 2. For the processor api, I think this api is mostly not >>>>> for >>>>>>>>> end >>>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>>>> However this are a couple cases where it might make sense >>>>> to >>>>>>>>>> expose >>>>>>>>>>>>>>>> it. I >>>>>>>>>>>>>>>>> think users coming from Samza, or JMS's MessageListener ( >>>>>>>>>>>>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ >>>>>>>>>>>> MessageListener.html) >>>>>>>>>>>>>>>>> understand a simple callback interface for message >>>>>> processing. >>>>>>>>> In >>>>>>>>>>>>>>>> fact, >>>>>>>>>>>>>>>>> people often ask why Kafka's consumer doesn't provide such >>>>>> an >>>>>>>>>>>>>>>> interface. >>>>>>>>>>>>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that >>>>>> the >>>>>>>>>>>>>>>> processor >>>>>>>>>>>>>>>>> API documentation is a bit scary for a person implementing >>>>>>> this >>>>>>>>>>>> type >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> api. My observation is that people using this style of API >>>>>>>>> don't >>>>>>>>>>>> do a >>>>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>>> of cross-message operations, then just do single message >>>>>>>>>> operations >>>>>>>>>>>>>>>> and use >>>>>>>>>>>>>>>>> a database for anything that spans messages. They also >>>>> don't >>>>>>>>>> factor >>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>> code into many MessageListeners and compose them, they >>>>> just >>>>>>>>> have >>>>>>>>>>>> one >>>>>>>>>>>>>>>>> listener that has the complete handling logic. Say I am a >>>>>> user >>>>>>>>>> who >>>>>>>>>>>>>>>> wants to >>>>>>>>>>>>>>>>> implement a single Processor in this style. Do we have an >>>>>> easy >>>>>>>>>> way >>>>>>>>>>>> to >>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>> that today (either with the .transform/.process methods in >>>>>>>>>> kstreams >>>>>>>>>>>>>>>> or with >>>>>>>>>>>>>>>>> the topology apis)? Is there anything we can do in the way >>>>>> of >>>>>>>>>>>> trivial >>>>>>>>>>>>>>>>> helper code to make this better? Also, how can we explain >>>>>> that >>>>>>>>>>>>>>>> pattern to >>>>>>>>>>>>>>>>> people? I think currently we have pretty in-depth docs on >>>>>> our >>>>>>>>>> apis >>>>>>>>>>>>>>>> but I >>>>>>>>>>>>>>>>> suspect a person trying to figure out how to implement a >>>>>>> simple >>>>>>>>>>>>>>>> callback >>>>>>>>>>>>>>>>> might get a bit lost trying to figure out how to wire it >>>>>> up. A >>>>>>>>>>>> simple >>>>>>>>>>>>>>>> five >>>>>>>>>>>>>>>>> line example in the docs would probably help a lot. Not >>>>> sure >>>>>>> if >>>>>>>>>>>> this >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> best addressed in this KIP or is a side comment. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < >>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's >>>>>>> Streaming >>>>>>>>>>>> API. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please have a look here: >>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Looking forward to your feedback! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- Guozhang >>>>> >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature