Just a small correction to KIP-120:

Changed it to use `Set` instead of `List` within `TopologyDescription`
and highlighting that we use getters (added couple of `()`). Also moved
`name` to interface `Node` as all nodes do have a name.

Also note, that I updated `GlobalStore` -- it's more detailed now (it
was incomplete before): the original information is contained in the
nested nodes.

https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=67641273&selectedPageVersions=20&selectedPageVersions=17



-Matthias

On 3/28/17 7:01 PM, Matthias J. Sax wrote:
> With regard to KIP-130:
> 
> Form KIP-130 thread:
> 
>> About subtopologies and tasks. We do have the concept of subtopologies 
>> already in KIP-120. It's only missing and ID that allow to link a 
>> subtopology to a task.
>>
>> IMHO, adding a simple variable to `Subtopoloy` that provide the id should be 
>> sufficient. We can simply document in the JavaDocs how Subtopology and 
>> TaskMetadata can be linked to each other.
> 
> I updated KIP-120 to include one for field for this.
> 
> 
> -Matthias
> 
> 
> On 3/27/17 4:27 PM, Matthias J. Sax wrote:
>> Hi,
>>
>> I would like to trigger this discussion again. It seems that the naming
>> question is rather subjective and both main alternatives (w/ or w/o the
>> word "Topology" in the name) have pros/cons.
>>
>> If you have any further thought, please share it. At the moment I still
>> propose `StreamsBuilder` in the KIP.
>>
>> I also want do point out, that the VOTE thread was already started. So
>> if you like the current KIP, please cast your vote there.
>>
>>
>> Thanks a lot!
>>
>>
>> -Matthias
>>
>>
>> On 3/23/17 3:38 PM, Matthias J. Sax wrote:
>>> Jay,
>>>
>>> about the naming schema:
>>>
>>>>>    1. "kstreams" - the DSL
>>>>>    2. "processor api" - the lower level callback/topology api
>>>>>    3. KStream/KTable - entities in the kstreams dsl
>>>>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>>>    including both kstreams and the processor API plus the underlying
>>>>>    implementation.
>>>
>>> It think this terminology has some issues... To me, `kstreams` was
>>> always not more than an abbreviation for `Kafka Streams` -- thus (1) and
>>> (4) kinda collide here. Following questions on the mailing list etc I
>>> often see people using kstreams or kstream exactly a abbr. for "Kafka
>>> Streams"
>>>
>>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>>>> particularly confusing.
>>>
>>> I disagree here. It's a very subtle difference between `kstreams` and
>>> `KStream` -- just singular/plural, thus (1) and (3) also "collide" --
>>> it's just too close to each other.
>>>
>>> Thus, I really think it's a good idea to get a new name for the DSL to
>>> get a better separation of the 4 concepts.
>>>
>>> Furthermore, we use the term "Streams API". Thus, I think
>>> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.
>>>
>>>
>>> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).
>>>
>>> I will start a VOTE thread. Of course, we can still discuss the naming
>>> issue. :)
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 3/22/17 8:53 PM, Jay Kreps wrote:
>>>> I don't feel strongly on this, so I'm happy with whatever everyone else
>>>> wants.
>>>>
>>>> Michael, I'm not arguing that people don't need to understand topologies, I
>>>> just think it is like rocks db, you need to understand it when
>>>> debugging/operating but not in the initial coding since the metaphor we're
>>>> providing at this layer isn't a topology of processors but rather something
>>>> like the collections api. Anyhow it won't hurt people to have it there.
>>>>
>>>> For the original KStreamBuilder thing, I think that came from the naming we
>>>> discussed originally:
>>>>
>>>>    1. "kstreams" - the DSL
>>>>    2. "processor api" - the lower level callback/topology api
>>>>    3. KStream/KTable - entities in the kstreams dsl
>>>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>>    including both kstreams and the processor API plus the underlying
>>>>    implementation.
>>>>
>>>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>>>> particularly confusing. Just like referring to the "java collections
>>>> library" isn't confusing even though it contains the Iterator interface
>>>> which is not actually itself a collection.
>>>>
>>>> So I think KStreamBuilder should technically have been KstreamsBuilder and
>>>> is intended not to be a builder of a KStream but rather the builder for the
>>>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
>>>>
>>>> -Jay
>>>>
>>>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>>>
>>>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` 
>>>>> that
>>>>> are going to be used in DSL, I agree both has their arguments:
>>>>>
>>>>> 1. On one side, people using the DSL layer probably do not need to be 
>>>>> aware
>>>>> (or rather, "learn about") of the "topology" concept, although this 
>>>>> concept
>>>>> is a publicly exposed one in Kafka Streams.
>>>>>
>>>>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>>>>> sounds a little weird, at least to me (admittedly subjective matter).
>>>>>
>>>>>
>>>>> Since the second bullet point seems to be more "subjective" and many 
>>>>> people
>>>>> are not worried about it, I'm OK to go with the other option.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> Forwarding to kafka-user.
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Michael Noll <mich...@confluent.io>
>>>>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>>> To: dev@kafka.apache.org
>>>>>>
>>>>>>
>>>>>> Matthias,
>>>>>>
>>>>>>> @Michael:
>>>>>>>
>>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>>>> little surprised by your last response, that goes the opposite
>>>>>> direction).
>>>>>>
>>>>>> Oh, sorry for not being clear.
>>>>>>
>>>>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>>>>> agree with most of Jay's reasoning, notably about carefully deciding how
>>>>>> much and which parts of the API/concept "surface" we expose to users of
>>>>> the
>>>>>> DSL.  However, and this is perhaps where I wasn't very clear, I disagree
>>>>> on
>>>>>> the particular opinion about not exposing the topology concept to DSL
>>>>>> users.  Instead, I think the concept of a topology is important to
>>>>>> understand even for DSL users -- particularly because of the way the DSL
>>>>> is
>>>>>> currently wiring your processing logic via the builder pattern.  (As I
>>>>>> noted, e.g. Akka uses a different approach where you might be able to get
>>>>>> away with not exposing the "topology" concept, but even in Akka there's
>>>>> the
>>>>>> notion of graphs and flows.)
>>>>>>
>>>>>>
>>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>>
>>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>>> not
>>>>>>>> aware of it.
>>>>>>>
>>>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>>>> don't
>>>>>>> put the Topology concept in the focus...
>>>>>>
>>>>>> Let me turn this around, because that was my point: it's confusing to
>>>>> have
>>>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is
>>>>>> not.
>>>>>>
>>>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL
>>>>>> users that there are two aspects at play: (1) defining the logic/plan of
>>>>>> your processing, and (2) the execution of that plan.  I have a less
>>>>> strong
>>>>>> opinion whether or not having "topology" in the names would help to
>>>>>> communicate this separation as well as combination of (1) and (2) to make
>>>>>> your app work as expected.
>>>>>>
>>>>>> If we stick with `KafkaStreams` for (2) *and* don't like having
>>>>> "topology"
>>>>>> in the name, then perhaps we should rename `KStreamBuilder` to
>>>>>> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of
>>>>> (1)
>>>>>> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
>>>>>> builder/helper for the Kafka Streams API", rather than "a builder for
>>>>>> streams".
>>>>>>
>>>>>> Also, I think some of the naming challenges we're discussing here are
>>>>>> caused by having this builder pattern in the first place.  If the Streams
>>>>>> API was implemented in Scala, for example, we could use implicits for
>>>>>> helping us to "stitch streams/tables together to build the full
>>>>> topology",
>>>>>> thus using a different (better?) approach to composing your topologies
>>>>> that
>>>>>> through a builder pattern.  So: perhaps there's a better way then the
>>>>>> builder, and that way would also be clearer on terminology?  That said,
>>>>>> this might take this KIP off-scope.
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> @Guozhang:
>>>>>>>
>>>>>>> I recognized that you want to have `Topology` in the name. But it seems
>>>>>>> that more people preferred to not have it (Jay, Ram, Michael [?],
>>>>>> myself).
>>>>>>>
>>>>>>> @Michael:
>>>>>>>
>>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>>>> little surprised by your last response, that goes the opposite
>>>>>> direction).
>>>>>>>
>>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>>
>>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>>> not
>>>>>>>> aware of it.
>>>>>>>
>>>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>>>> don't
>>>>>>> put the Topology concept in the focus...
>>>>>>>
>>>>>>> Furthermore,
>>>>>>>
>>>>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>>>> No)?
>>>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>>
>>>>>>> I am not sure, if this is too much a concern. In contrast to
>>>>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts
>>>>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder`
>>>>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on
>>>>>>> KStream. It's just a builder from the Streams API -- you don't need to
>>>>>>> worry what you are building -- and you don't need to think about the
>>>>>>> `Topology` concept (of course, you see that .build() return a
>>>>> Topology).
>>>>>>>
>>>>>>>
>>>>>>> Personally, I see pros and cons for both `StreamsBuilder` and
>>>>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and
>>>>>>> Ram can follow up and share their thoughts?
>>>>>>>
>>>>>>> I would also help a lot if other people put their vote for a name, too.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote:
>>>>>>>> Just to clarify, I did want to have the term `Topology` as part of
>>>>> the
>>>>>>>> class name, for the reasons above. I'm not too worried about to be
>>>>>>>> consistent with the previous names, but I feel the
>>>>> `XXTopologyBuilder`
>>>>>> is
>>>>>>>> better than `XXStreamsBuilder` since it's build() function returns a
>>>>>>>> Topology object.
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io
>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hmm, I must admit I don't like this last update all too much.
>>>>>>>>>
>>>>>>>>> Basically we would have:
>>>>>>>>>
>>>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>>>
>>>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>>>     // Ah right, you are composing a topology here, though you are
>>>>> not
>>>>>>>>> aware of it.
>>>>>>>>>
>>>>>>>>>     KafkaStreams streams = new KafkaStreams(builder.build(),
>>>>>>>>> streamsConfiguration);
>>>>>>>>>
>>>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>>>> No)?
>>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>>>>
>>>>>>>>> I also interpret Guozhang's last response as that he'd prefer to
>>>>> have
>>>>>>>>> "Topology" in the class/interface names.  I am aware that we
>>>>> shouldn't
>>>>>>>>> necessarily use the status quo to make decisions about future
>>>>> changes,
>>>>>>> but
>>>>>>>>> the very first concept we explain in the Kafka Streams documentation
>>>>>> is
>>>>>>>>> "Stream Processing Topology":
>>>>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_
>>>>> concepts
>>>>>>>>>
>>>>>>>>> -Michael
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <
>>>>>> matth...@confluent.io
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> \cc users list
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700
>>>>>>>>>> From: Matthias J. Sax <matth...@confluent.io>
>>>>>>>>>> Organization: Confluent Inc
>>>>>>>>>> To: dev@kafka.apache.org
>>>>>>>>>>
>>>>>>>>>> I want to push this discussion further.
>>>>>>>>>>
>>>>>>>>>> Guozhang's argument about "exposing" the Topology class is valid.
>>>>>> It's
>>>>>>> a
>>>>>>>>>> public class anyway, so it's not as issue. However, I think the
>>>>>>> question
>>>>>>>>>> is not too much about exposing but about "advertising" (ie, putting
>>>>>> it
>>>>>>>>>> into the focus) or not at DSL level.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If I interpret the last replies correctly, it seems that we could
>>>>>> agree
>>>>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly.
>>>>> Please
>>>>>>>>>> correct me, if I got this wrong.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If there are not other objects -- this naming discussion was the
>>>>> last
>>>>>>>>>> open point to far -- I would like the start the VOTE thread.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote:
>>>>>>>>>>> I'd like to keep the term "Topology" inside the builder class
>>>>> since,
>>>>>>> as
>>>>>>>>>>> Matthias mentioned, this builder#build() function returns a
>>>>>> "Topology"
>>>>>>>>>>> object, whose type is a public class anyways. Although you can
>>>>> argue
>>>>>>> to
>>>>>>>>>> let
>>>>>>>>>>> users always call
>>>>>>>>>>>
>>>>>>>>>>> "new KafkaStreams(builder.build())"
>>>>>>>>>>>
>>>>>>>>>>> I think it is still more benefit to expose this concept.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Guozhang
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <
>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your input Michael.
>>>>>>>>>>>>
>>>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>>>>> logical
>>>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't thinks this is a good idea, for multiple reasons:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) We would reuse a name for a completely different purpose. The
>>>>>>> same
>>>>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The
>>>>>>>>>>>> confusion would just be too large.
>>>>>>>>>>>>
>>>>>>>>>>>> So if we would start from scratch, it might be ok to do so, but
>>>>> now
>>>>>>> we
>>>>>>>>>>>> cannot make this move, IMHO.
>>>>>>>>>>>>
>>>>>>>>>>>> Also a clarification question: do you suggest to have static
>>>>>> methods
>>>>>>>>>>>> #stream and #table -- I am not sure if this would work?
>>>>>>>>>>>> (or was you code snippet just simplification?)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to
>>>>>> consumer
>>>>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the
>>>>>> naming
>>>>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it
>>>>>> would
>>>>>>>>> be
>>>>>>>>>>>> a good choice to "break" this naming scheme.
>>>>>>>>>>>>
>>>>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() --
>>>>>> and
>>>>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer
>>>>>> and
>>>>>>>>>>>> producer client.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry
>>>>> class
>>>>>>>>> would
>>>>>>>>>>>> be, that it would need to create a Topology that can be given to
>>>>>> the
>>>>>>>>>>>> "runner/processing-client". Thus the pattern would be
>>>>>>>>>>>>
>>>>>>>>>>>>> Topology topology = streams.build();
>>>>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>>>>>>>>>>>>
>>>>>>>>>>>> (or of course as a one liner).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, there was the idea (that we intentionally
>>>>>> excluded
>>>>>>>>>>>> from the KIP), to change the "client instantiation" pattern.
>>>>>>>>>>>>
>>>>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling
>>>>>>>>> "new")
>>>>>>>>>>>> and the topology if provided as a constructor argument. However,
>>>>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI),
>>>>> the
>>>>>>> DSL
>>>>>>>>>>>> builder could create the client for the user.
>>>>>>>>>>>>
>>>>>>>>>>>> Something like this:
>>>>>>>>>>>>
>>>>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder();
>>>>>>>>>>>>> builder.whatever() // use the builder
>>>>>>>>>>>>>
>>>>>>>>>>>>> StreamsConfig config = ....
>>>>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config);
>>>>>>>>>>>>
>>>>>>>>>>>> If we change the patter like this, the notion a the "DSL builder"
>>>>>>>>> would
>>>>>>>>>>>> change, as it does not create a topology anymore, but it creates
>>>>>> the
>>>>>>>>>>>> "processing client". This would address Jay's concern about "not
>>>>>>>>>>>> exposing concept users don't need the understand" and would not
>>>>>>>>> require
>>>>>>>>>>>> to include the word "Topology" in the DSL builder class name,
>>>>>> because
>>>>>>>>>>>> the builder does not build a Topology anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> I just put some names that came to my mind first hand -- did not
>>>>>>> think
>>>>>>>>>>>> about good names. It's just to discuss the pattern.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote:
>>>>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about
>>>>>>> being
>>>>>>>>>>>>> careful which concepts we do and do not expose, depending on
>>>>> which
>>>>>>>>> user
>>>>>>>>>>>>> group / user type is affected.  That said, I'm not sure yet
>>>>>> whether
>>>>>>>>> or
>>>>>>>>>>>> not
>>>>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For what it's worth, here's how related technologies define/name
>>>>>>>>> their
>>>>>>>>>>>>> "topologies" and "builders".  Note that, in all cases, it's
>>>>> about
>>>>>>>>>>>>> constructing a logical processing plan, which then is being
>>>>>>>>>> executed/run.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam)
>>>>>>>>>>>>>     - To add a source you first instantiate the Source (e.g.
>>>>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`),
>>>>>>>>>>>>>       then attach it to your processing plan via
>>>>>>>>>>>> `Pipeline#apply(<source>)`.
>>>>>>>>>>>>>       This setup is a bit different to our DSL because in our
>>>>> DSL
>>>>>>> the
>>>>>>>>>>>>> builder does both, i.e.
>>>>>>>>>>>>>       instantiating + auto-attaching to itself.
>>>>>>>>>>>>>     - To execute the processing plan you call
>>>>>> `Pipeline#execute()`.
>>>>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL.
>>>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`.
>>>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>>>> `StreamingContext#execute()`.
>>>>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to
>>>>>> our
>>>>>>>>>> DSL.
>>>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost",
>>>>> 9999)`.
>>>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>>>>> `StreamExecutionEnvironment#execute()`.
>>>>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing
>>>>> Sources
>>>>>> (~
>>>>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>>>>>>>>>>>>>   into Flows, which are [Runnable]Graphs.
>>>>>>>>>>>>>     - You instantiate a Source directly, and then compose the
>>>>>> Source
>>>>>>>>>> with
>>>>>>>>>>>>> Sinks to create a RunnableGraph:
>>>>>>>>>>>>>       see signature `Source#to[Mat2](sink: Graph[SinkShape[Out],
>>>>>>>>>> Mat2]):
>>>>>>>>>>>>> RunnableGraph[Mat]`.
>>>>>>>>>>>>>     - To execute the processing plan you call `Flow#run()`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In our DSL, in comparison, we do:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API)
>>>>>>>>>>>>>     - To add a source you call e.g.
>>>>> `KStreamBuilder#stream("input-
>>>>>>>>>>>> topic")`.
>>>>>>>>>>>>>     - To execute the processing plan you create a `KafkaStreams`
>>>>>>>>>> instance
>>>>>>>>>>>>> from `KStreamBuilder`
>>>>>>>>>>>>>       (where the builder will instantiate the topology =
>>>>>> processing
>>>>>>>>>> plan
>>>>>>>>>>>> to
>>>>>>>>>>>>> be executed), and then
>>>>>>>>>>>>>       call `KafkaStreams#start()`.  Think of `KafkaStreams` as
>>>>> our
>>>>>>>>>>>> runner.
>>>>>>>>>>>>>
>>>>>>>>>>>>> First, I agree with the sentiment that the current name of
>>>>>>>>>>>> `KStreamBuilder`
>>>>>>>>>>>>> isn't great (which is why we're having this discussion).  Also,
>>>>>> that
>>>>>>>>>>>>> finding a good name is tricky. ;-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not
>>>>> sure
>>>>>>>>>>>> whether
>>>>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does
>>>>>> not
>>>>>>>>>>>> include
>>>>>>>>>>>>> "topology" or a similar term) that much more.  It still doesn't
>>>>>>>>>> describe
>>>>>>>>>>>>> what that class actually does, and what the difference to
>>>>>>>>>> `KafkaStreams`
>>>>>>>>>>>>> is.  IMHO, the point of `KStreamBuilder` is that it lets you
>>>>>> build a
>>>>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is
>>>>> the
>>>>>>>>> thing
>>>>>>>>>>>>> that executes that plan.  I'm not yet convinced that abstracting
>>>>>>>>> these
>>>>>>>>>>>> two
>>>>>>>>>>>>> points away from the user is a good idea if the argument is that
>>>>>>> it's
>>>>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure
>>>>> is
>>>>>>>>>>>> actually
>>>>>>>>>>>>> true).
>>>>>>>>>>>>>
>>>>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less
>>>>>>>>>> technically
>>>>>>>>>>>>> correct names", I'd argue we should not even use something like
>>>>>>>>>>>> "Builder".
>>>>>>>>>>>>> We could, for example, also pick the following names:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>>>> logical
>>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the
>>>>>>> plan,
>>>>>>>>>>>> with
>>>>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <
>>>>>>>>> r...@confluent.io>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> StreamsBuilder would be my vote.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io>
>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology
>>>>>>> than
>>>>>>>>>> any
>>>>>>>>>>>>>>> particular new replacement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <
>>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jay,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks for your feedback
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That's the current name and I personally think it's not the
>>>>>> best
>>>>>>>>>> one.
>>>>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we
>>>>>> have
>>>>>>>>>> the
>>>>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates
>>>>> both.
>>>>>>>>>>>> However,
>>>>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I understand your argument, and I am personally open the
>>>>> remove
>>>>>>>>> the
>>>>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what
>>>>>>>>> others
>>>>>>>>>>>>>>>> think about this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks
>>>>>>> it's
>>>>>>>>>> out
>>>>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing
>>>>>> leaking
>>>>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some
>>>>>>>>>> concepts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we
>>>>>>> take
>>>>>>>>>> if
>>>>>>>>>>>>>>>> from there:
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>>>>>>>>>>>>>>>> Two things:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   1. This is a minor thing but the proposed new name for
>>>>>>>>>>>> KStreamBuilder
>>>>>>>>>>>>>>>>>   is StreamsTopologyBuilder. I actually think we should not
>>>>>> put
>>>>>>>>>>>>>>>> topology in
>>>>>>>>>>>>>>>>>   the name as topology is not a concept you need to
>>>>> understand
>>>>>>> at
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>   kstreams layer right now. I'd think of three categories of
>>>>>>>>>>>> concepts:
>>>>>>>>>>>>>>>> (1)
>>>>>>>>>>>>>>>>>   concepts you need to understand to get going even for a
>>>>>> simple
>>>>>>>>>>>>>>>> example, (2)
>>>>>>>>>>>>>>>>>   concepts you need to understand to operate and debug a
>>>>> real
>>>>>>>>>>>>>>>> production app,
>>>>>>>>>>>>>>>>>   (3) concepts we truly abstract and you don't need to ever
>>>>>>>>>>>> understand.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>   think in the kstream layer topologies are currently
>>>>> category
>>>>>>>>> (2),
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>   is where they belong. By introducing the name in even the
>>>>>>>>>> simplest
>>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>   it means the user has to go read about toplogies to really
>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>   this simple snippet. What if instead we called it
>>>>>>>>>> KStreamsBuilder?
>>>>>>>>>>>>>>>>>   2. For the processor api, I think this api is mostly not
>>>>> for
>>>>>>>>> end
>>>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>>>   However this are a couple cases where it might make sense
>>>>> to
>>>>>>>>>> expose
>>>>>>>>>>>>>>>> it. I
>>>>>>>>>>>>>>>>>   think users coming from Samza, or JMS's MessageListener (
>>>>>>>>>>>>>>>>>   https://docs.oracle.com/javaee/7/api/javax/jms/
>>>>>>>>>>>> MessageListener.html)
>>>>>>>>>>>>>>>>>   understand a simple callback interface for message
>>>>>> processing.
>>>>>>>>> In
>>>>>>>>>>>>>>>> fact,
>>>>>>>>>>>>>>>>>   people often ask why Kafka's consumer doesn't provide such
>>>>>> an
>>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>>   I'd argue we do, it's KafkaStreams. The only issue is that
>>>>>> the
>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>   API documentation is a bit scary for a person implementing
>>>>>>> this
>>>>>>>>>>>> type
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>   api. My observation is that people using this style of API
>>>>>>>>> don't
>>>>>>>>>>>> do a
>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>   of cross-message operations, then just do single message
>>>>>>>>>> operations
>>>>>>>>>>>>>>>> and use
>>>>>>>>>>>>>>>>>   a database for anything that spans messages. They also
>>>>> don't
>>>>>>>>>> factor
>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>   code into many MessageListeners and compose them, they
>>>>> just
>>>>>>>>> have
>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>   listener that has the complete handling logic. Say I am a
>>>>>> user
>>>>>>>>>> who
>>>>>>>>>>>>>>>> wants to
>>>>>>>>>>>>>>>>>   implement a single Processor in this style. Do we have an
>>>>>> easy
>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>   that today (either with the .transform/.process methods in
>>>>>>>>>> kstreams
>>>>>>>>>>>>>>>> or with
>>>>>>>>>>>>>>>>>   the topology apis)? Is there anything we can do in the way
>>>>>> of
>>>>>>>>>>>> trivial
>>>>>>>>>>>>>>>>>   helper code to make this better? Also, how can we explain
>>>>>> that
>>>>>>>>>>>>>>>> pattern to
>>>>>>>>>>>>>>>>>   people? I think currently we have pretty in-depth docs on
>>>>>> our
>>>>>>>>>> apis
>>>>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>>   suspect a person trying to figure out how to implement a
>>>>>>> simple
>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>   might get a bit lost trying to figure out how to wire it
>>>>>> up. A
>>>>>>>>>>>> simple
>>>>>>>>>>>>>>>> five
>>>>>>>>>>>>>>>>>   line example in the docs would probably help a lot. Not
>>>>> sure
>>>>>>> if
>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>   best addressed in this KIP or is a side comment.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <
>>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's
>>>>>>> Streaming
>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please have a look here:
>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Looking forward to your feedback!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to