Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Matthias J. Sax Thu, 23 Mar 2017 15:38:47 -0700

Jay,

about the naming schema:


>>    1. "kstreams" - the DSL
>>    2. "processor api" - the lower level callback/topology api
>>    3. KStream/KTable - entities in the kstreams dsl
>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>    including both kstreams and the processor API plus the underlying
>>    implementation.

It think this terminology has some issues... To me, `kstreams` was
always not more than an abbreviation for `Kafka Streams` -- thus (1) and
(4) kinda collide here. Following questions on the mailing list etc I
often see people using kstreams or kstream exactly a abbr. for "Kafka
Streams"

> I think referring to the dsl as "kstreams" is cute and pneumonic and not
> particularly confusing.

I disagree here. It's a very subtle difference between `kstreams` and
`KStream` -- just singular/plural, thus (1) and (3) also "collide" --
it's just too close to each other.

Thus, I really think it's a good idea to get a new name for the DSL to
get a better separation of the 4 concepts.

Furthermore, we use the term "Streams API". Thus, I think
`StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.


Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).

I will start a VOTE thread. Of course, we can still discuss the naming
issue. :)



-Matthias


On 3/22/17 8:53 PM, Jay Kreps wrote:
> I don't feel strongly on this, so I'm happy with whatever everyone else
> wants.
> 
> Michael, I'm not arguing that people don't need to understand topologies, I
> just think it is like rocks db, you need to understand it when
> debugging/operating but not in the initial coding since the metaphor we're
> providing at this layer isn't a topology of processors but rather something
> like the collections api. Anyhow it won't hurt people to have it there.
> 
> For the original KStreamBuilder thing, I think that came from the naming we
> discussed originally:
> 
>    1. "kstreams" - the DSL
>    2. "processor api" - the lower level callback/topology api
>    3. KStream/KTable - entities in the kstreams dsl
>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>    including both kstreams and the processor API plus the underlying
>    implementation.
> 
> I think referring to the dsl as "kstreams" is cute and pneumonic and not
> particularly confusing. Just like referring to the "java collections
> library" isn't confusing even though it contains the Iterator interface
> which is not actually itself a collection.
> 
> So I think KStreamBuilder should technically have been KstreamsBuilder and
> is intended not to be a builder of a KStream but rather the builder for the
> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
> 
> -Jay
> 
> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
>> are going to be used in DSL, I agree both has their arguments:
>>
>> 1. On one side, people using the DSL layer probably do not need to be aware
>> (or rather, "learn about") of the "topology" concept, although this concept
>> is a publicly exposed one in Kafka Streams.
>>
>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>> sounds a little weird, at least to me (admittedly subjective matter).
>>
>>
>> Since the second bullet point seems to be more "subjective" and many people
>> are not worried about it, I'm OK to go with the other option.
>>
>>
>> Guozhang
>>
>>
>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>> wrote:
>>
>>> Forwarding to kafka-user.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Michael Noll <mich...@confluent.io>
>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>> To: dev@kafka.apache.org
>>>
>>>
>>> Matthias,
>>>
>>>> @Michael:
>>>>
>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>> little surprised by your last response, that goes the opposite
>>> direction).
>>>
>>> Oh, sorry for not being clear.
>>>
>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>> agree with most of Jay's reasoning, notably about carefully deciding how
>>> much and which parts of the API/concept "surface" we expose to users of
>> the
>>> DSL.  However, and this is perhaps where I wasn't very clear, I disagree
>> on
>>> the particular opinion about not exposing the topology concept to DSL
>>> users.  Instead, I think the concept of a topology is important to
>>> understand even for DSL users -- particularly because of the way the DSL
>> is
>>> currently wiring your processing logic via the builder pattern.  (As I
>>> noted, e.g. Akka uses a different approach where you might be able to get
>>> away with not exposing the "topology" concept, but even in Akka there's
>> the
>>> notion of graphs and flows.)
>>>
>>>
>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>
>>>>>     // And here you'd define your...well, what actually?
>>>>>     // Ah right, you are composing a topology here, though you are
>> not
>>>>> aware of it.
>>>>
>>>> Yes. You are not aware of if -- that's the whole point about it --
>> don't
>>>> put the Topology concept in the focus...
>>>
>>> Let me turn this around, because that was my point: it's confusing to
>> have
>>> a name "StreamsBuilder" if that thing isn't building streams, and it is
>>> not.
>>>
>>> As I mentioned before, I do think it is a benefit to make it clear to DSL
>>> users that there are two aspects at play: (1) defining the logic/plan of
>>> your processing, and (2) the execution of that plan.  I have a less
>> strong
>>> opinion whether or not having "topology" in the names would help to
>>> communicate this separation as well as combination of (1) and (2) to make
>>> your app work as expected.
>>>
>>> If we stick with `KafkaStreams` for (2) *and* don't like having
>> "topology"
>>> in the name, then perhaps we should rename `KStreamBuilder` to
>>> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of
>> (1)
>>> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
>>> builder/helper for the Kafka Streams API", rather than "a builder for
>>> streams".
>>>
>>> Also, I think some of the naming challenges we're discussing here are
>>> caused by having this builder pattern in the first place.  If the Streams
>>> API was implemented in Scala, for example, we could use implicits for
>>> helping us to "stitch streams/tables together to build the full
>> topology",
>>> thus using a different (better?) approach to composing your topologies
>> that
>>> through a builder pattern.  So: perhaps there's a better way then the
>>> builder, and that way would also be clearer on terminology?  That said,
>>> this might take this KIP off-scope.
>>>
>>> -Michael
>>>
>>>
>>>
>>>
>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io
>>>
>>> wrote:
>>>
>>>> @Guozhang:
>>>>
>>>> I recognized that you want to have `Topology` in the name. But it seems
>>>> that more people preferred to not have it (Jay, Ram, Michael [?],
>>> myself).
>>>>
>>>> @Michael:
>>>>
>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>> little surprised by your last response, that goes the opposite
>>> direction).
>>>>
>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>
>>>>>     // And here you'd define your...well, what actually?
>>>>>     // Ah right, you are composing a topology here, though you are
>> not
>>>>> aware of it.
>>>>
>>>> Yes. You are not aware of if -- that's the whole point about it --
>> don't
>>>> put the Topology concept in the focus...
>>>>
>>>> Furthermore,
>>>>
>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>> No)?
>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>
>>>> I am not sure, if this is too much a concern. In contrast to
>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts
>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder`
>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on
>>>> KStream. It's just a builder from the Streams API -- you don't need to
>>>> worry what you are building -- and you don't need to think about the
>>>> `Topology` concept (of course, you see that .build() return a
>> Topology).
>>>>
>>>>
>>>> Personally, I see pros and cons for both `StreamsBuilder` and
>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and
>>>> Ram can follow up and share their thoughts?
>>>>
>>>> I would also help a lot if other people put their vote for a name, too.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote:
>>>>> Just to clarify, I did want to have the term `Topology` as part of
>> the
>>>>> class name, for the reasons above. I'm not too worried about to be
>>>>> consistent with the previous names, but I feel the
>> `XXTopologyBuilder`
>>> is
>>>>> better than `XXStreamsBuilder` since it's build() function returns a
>>>>> Topology object.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io
>>>
>>>> wrote:
>>>>>
>>>>>> Hmm, I must admit I don't like this last update all too much.
>>>>>>
>>>>>> Basically we would have:
>>>>>>
>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>
>>>>>>     // And here you'd define your...well, what actually?
>>>>>>     // Ah right, you are composing a topology here, though you are
>> not
>>>>>> aware of it.
>>>>>>
>>>>>>     KafkaStreams streams = new KafkaStreams(builder.build(),
>>>>>> streamsConfiguration);
>>>>>>
>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>> No)?
>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>
>>>>>> I also interpret Guozhang's last response as that he'd prefer to
>> have
>>>>>> "Topology" in the class/interface names.  I am aware that we
>> shouldn't
>>>>>> necessarily use the status quo to make decisions about future
>> changes,
>>>> but
>>>>>> the very first concept we explain in the Kafka Streams documentation
>>> is
>>>>>> "Stream Processing Topology":
>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_
>> concepts
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <
>>> matth...@confluent.io
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> \cc users list
>>>>>>>
>>>>>>>
>>>>>>> -------- Forwarded Message --------
>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700
>>>>>>> From: Matthias J. Sax <matth...@confluent.io>
>>>>>>> Organization: Confluent Inc
>>>>>>> To: dev@kafka.apache.org
>>>>>>>
>>>>>>> I want to push this discussion further.
>>>>>>>
>>>>>>> Guozhang's argument about "exposing" the Topology class is valid.
>>> It's
>>>> a
>>>>>>> public class anyway, so it's not as issue. However, I think the
>>>> question
>>>>>>> is not too much about exposing but about "advertising" (ie, putting
>>> it
>>>>>>> into the focus) or not at DSL level.
>>>>>>>
>>>>>>>
>>>>>>> If I interpret the last replies correctly, it seems that we could
>>> agree
>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly.
>> Please
>>>>>>> correct me, if I got this wrong.
>>>>>>>
>>>>>>>
>>>>>>> If there are not other objects -- this naming discussion was the
>> last
>>>>>>> open point to far -- I would like the start the VOTE thread.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote:
>>>>>>>> I'd like to keep the term "Topology" inside the builder class
>> since,
>>>> as
>>>>>>>> Matthias mentioned, this builder#build() function returns a
>>> "Topology"
>>>>>>>> object, whose type is a public class anyways. Although you can
>> argue
>>>> to
>>>>>>> let
>>>>>>>> users always call
>>>>>>>>
>>>>>>>> "new KafkaStreams(builder.build())"
>>>>>>>>
>>>>>>>> I think it is still more benefit to expose this concept.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <
>>>>>> matth...@confluent.io
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for your input Michael.
>>>>>>>>>
>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>> logical
>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>
>>>>>>>>> I don't thinks this is a good idea, for multiple reasons:
>>>>>>>>>
>>>>>>>>> (1) We would reuse a name for a completely different purpose. The
>>>> same
>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The
>>>>>>>>> confusion would just be too large.
>>>>>>>>>
>>>>>>>>> So if we would start from scratch, it might be ok to do so, but
>> now
>>>> we
>>>>>>>>> cannot make this move, IMHO.
>>>>>>>>>
>>>>>>>>> Also a clarification question: do you suggest to have static
>>> methods
>>>>>>>>> #stream and #table -- I am not sure if this would work?
>>>>>>>>> (or was you code snippet just simplification?)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to
>>> consumer
>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the
>>> naming
>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it
>>> would
>>>>>> be
>>>>>>>>> a good choice to "break" this naming scheme.
>>>>>>>>>
>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() --
>>> and
>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer
>>> and
>>>>>>>>> producer client.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry
>> class
>>>>>> would
>>>>>>>>> be, that it would need to create a Topology that can be given to
>>> the
>>>>>>>>> "runner/processing-client". Thus the pattern would be
>>>>>>>>>
>>>>>>>>>> Topology topology = streams.build();
>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>>>>>>>>>
>>>>>>>>> (or of course as a one liner).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On the other hand, there was the idea (that we intentionally
>>> excluded
>>>>>>>>> from the KIP), to change the "client instantiation" pattern.
>>>>>>>>>
>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling
>>>>>> "new")
>>>>>>>>> and the topology if provided as a constructor argument. However,
>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI),
>> the
>>>> DSL
>>>>>>>>> builder could create the client for the user.
>>>>>>>>>
>>>>>>>>> Something like this:
>>>>>>>>>
>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder();
>>>>>>>>>> builder.whatever() // use the builder
>>>>>>>>>>
>>>>>>>>>> StreamsConfig config = ....
>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config);
>>>>>>>>>
>>>>>>>>> If we change the patter like this, the notion a the "DSL builder"
>>>>>> would
>>>>>>>>> change, as it does not create a topology anymore, but it creates
>>> the
>>>>>>>>> "processing client". This would address Jay's concern about "not
>>>>>>>>> exposing concept users don't need the understand" and would not
>>>>>> require
>>>>>>>>> to include the word "Topology" in the DSL builder class name,
>>> because
>>>>>>>>> the builder does not build a Topology anymore.
>>>>>>>>>
>>>>>>>>> I just put some names that came to my mind first hand -- did not
>>>> think
>>>>>>>>> about good names. It's just to discuss the pattern.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote:
>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about
>>>> being
>>>>>>>>>> careful which concepts we do and do not expose, depending on
>> which
>>>>>> user
>>>>>>>>>> group / user type is affected.  That said, I'm not sure yet
>>> whether
>>>>>> or
>>>>>>>>> not
>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL.
>>>>>>>>>>
>>>>>>>>>> For what it's worth, here's how related technologies define/name
>>>>>> their
>>>>>>>>>> "topologies" and "builders".  Note that, in all cases, it's
>> about
>>>>>>>>>> constructing a logical processing plan, which then is being
>>>>>>> executed/run.
>>>>>>>>>>
>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam)
>>>>>>>>>>     - To add a source you first instantiate the Source (e.g.
>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`),
>>>>>>>>>>       then attach it to your processing plan via
>>>>>>>>> `Pipeline#apply(<source>)`.
>>>>>>>>>>       This setup is a bit different to our DSL because in our
>> DSL
>>>> the
>>>>>>>>>> builder does both, i.e.
>>>>>>>>>>       instantiating + auto-attaching to itself.
>>>>>>>>>>     - To execute the processing plan you call
>>> `Pipeline#execute()`.
>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL.
>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`.
>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>> `StreamingContext#execute()`.
>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to
>>> our
>>>>>>> DSL.
>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost",
>> 9999)`.
>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>> `StreamExecutionEnvironment#execute()`.
>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing
>> Sources
>>> (~
>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>>>>>>>>>>   into Flows, which are [Runnable]Graphs.
>>>>>>>>>>     - You instantiate a Source directly, and then compose the
>>> Source
>>>>>>> with
>>>>>>>>>> Sinks to create a RunnableGraph:
>>>>>>>>>>       see signature `Source#to[Mat2](sink: Graph[SinkShape[Out],
>>>>>>> Mat2]):
>>>>>>>>>> RunnableGraph[Mat]`.
>>>>>>>>>>     - To execute the processing plan you call `Flow#run()`.
>>>>>>>>>>
>>>>>>>>>> In our DSL, in comparison, we do:
>>>>>>>>>>
>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API)
>>>>>>>>>>     - To add a source you call e.g.
>> `KStreamBuilder#stream("input-
>>>>>>>>> topic")`.
>>>>>>>>>>     - To execute the processing plan you create a `KafkaStreams`
>>>>>>> instance
>>>>>>>>>> from `KStreamBuilder`
>>>>>>>>>>       (where the builder will instantiate the topology =
>>> processing
>>>>>>> plan
>>>>>>>>> to
>>>>>>>>>> be executed), and then
>>>>>>>>>>       call `KafkaStreams#start()`.  Think of `KafkaStreams` as
>> our
>>>>>>>>> runner.
>>>>>>>>>>
>>>>>>>>>> First, I agree with the sentiment that the current name of
>>>>>>>>> `KStreamBuilder`
>>>>>>>>>> isn't great (which is why we're having this discussion).  Also,
>>> that
>>>>>>>>>> finding a good name is tricky. ;-)
>>>>>>>>>>
>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not
>> sure
>>>>>>>>> whether
>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does
>>> not
>>>>>>>>> include
>>>>>>>>>> "topology" or a similar term) that much more.  It still doesn't
>>>>>>> describe
>>>>>>>>>> what that class actually does, and what the difference to
>>>>>>> `KafkaStreams`
>>>>>>>>>> is.  IMHO, the point of `KStreamBuilder` is that it lets you
>>> build a
>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is
>> the
>>>>>> thing
>>>>>>>>>> that executes that plan.  I'm not yet convinced that abstracting
>>>>>> these
>>>>>>>>> two
>>>>>>>>>> points away from the user is a good idea if the argument is that
>>>> it's
>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure
>> is
>>>>>>>>> actually
>>>>>>>>>> true).
>>>>>>>>>>
>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less
>>>>>>> technically
>>>>>>>>>> correct names", I'd argue we should not even use something like
>>>>>>>>> "Builder".
>>>>>>>>>> We could, for example, also pick the following names:
>>>>>>>>>>
>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>> logical
>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the
>>>> plan,
>>>>>>>>> with
>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <
>>>>>> r...@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> StreamsBuilder would be my vote.
>>>>>>>>>>>
>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io>
>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>
>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology
>>>> than
>>>>>>> any
>>>>>>>>>>>> particular new replacement.
>>>>>>>>>>>>
>>>>>>>>>>>> -Jay
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <
>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Jay,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your feedback
>>>>>>>>>>>>>
>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder?
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's the current name and I personally think it's not the
>>> best
>>>>>>> one.
>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we
>>> have
>>>>>>> the
>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates
>> both.
>>>>>>>>> However,
>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I understand your argument, and I am personally open the
>> remove
>>>>>> the
>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what
>>>>>> others
>>>>>>>>>>>>> think about this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks
>>>> it's
>>>>>>> out
>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing
>>> leaking
>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some
>>>>>>> concepts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we
>>>> take
>>>>>>> if
>>>>>>>>>>>>> from there:
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>>>>>>>>>>>>> Two things:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   1. This is a minor thing but the proposed new name for
>>>>>>>>> KStreamBuilder
>>>>>>>>>>>>>>   is StreamsTopologyBuilder. I actually think we should not
>>> put
>>>>>>>>>>>>> topology in
>>>>>>>>>>>>>>   the name as topology is not a concept you need to
>> understand
>>>> at
>>>>>>> the
>>>>>>>>>>>>>>   kstreams layer right now. I'd think of three categories of
>>>>>>>>> concepts:
>>>>>>>>>>>>> (1)
>>>>>>>>>>>>>>   concepts you need to understand to get going even for a
>>> simple
>>>>>>>>>>>>> example, (2)
>>>>>>>>>>>>>>   concepts you need to understand to operate and debug a
>> real
>>>>>>>>>>>>> production app,
>>>>>>>>>>>>>>   (3) concepts we truly abstract and you don't need to ever
>>>>>>>>> understand.
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>   think in the kstream layer topologies are currently
>> category
>>>>>> (2),
>>>>>>>>> and
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>   is where they belong. By introducing the name in even the
>>>>>>> simplest
>>>>>>>>>>>>> example
>>>>>>>>>>>>>>   it means the user has to go read about toplogies to really
>>>>>>>>> understand
>>>>>>>>>>>>> even
>>>>>>>>>>>>>>   this simple snippet. What if instead we called it
>>>>>>> KStreamsBuilder?
>>>>>>>>>>>>>>   2. For the processor api, I think this api is mostly not
>> for
>>>>>> end
>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>   However this are a couple cases where it might make sense
>> to
>>>>>>> expose
>>>>>>>>>>>>> it. I
>>>>>>>>>>>>>>   think users coming from Samza, or JMS's MessageListener (
>>>>>>>>>>>>>>   https://docs.oracle.com/javaee/7/api/javax/jms/
>>>>>>>>> MessageListener.html)
>>>>>>>>>>>>>>   understand a simple callback interface for message
>>> processing.
>>>>>> In
>>>>>>>>>>>>> fact,
>>>>>>>>>>>>>>   people often ask why Kafka's consumer doesn't provide such
>>> an
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>   I'd argue we do, it's KafkaStreams. The only issue is that
>>> the
>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>   API documentation is a bit scary for a person implementing
>>>> this
>>>>>>>>> type
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>   api. My observation is that people using this style of API
>>>>>> don't
>>>>>>>>> do a
>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>   of cross-message operations, then just do single message
>>>>>>> operations
>>>>>>>>>>>>> and use
>>>>>>>>>>>>>>   a database for anything that spans messages. They also
>> don't
>>>>>>> factor
>>>>>>>>>>>>> their
>>>>>>>>>>>>>>   code into many MessageListeners and compose them, they
>> just
>>>>>> have
>>>>>>>>> one
>>>>>>>>>>>>>>   listener that has the complete handling logic. Say I am a
>>> user
>>>>>>> who
>>>>>>>>>>>>> wants to
>>>>>>>>>>>>>>   implement a single Processor in this style. Do we have an
>>> easy
>>>>>>> way
>>>>>>>>> to
>>>>>>>>>>>>> do
>>>>>>>>>>>>>>   that today (either with the .transform/.process methods in
>>>>>>> kstreams
>>>>>>>>>>>>> or with
>>>>>>>>>>>>>>   the topology apis)? Is there anything we can do in the way
>>> of
>>>>>>>>> trivial
>>>>>>>>>>>>>>   helper code to make this better? Also, how can we explain
>>> that
>>>>>>>>>>>>> pattern to
>>>>>>>>>>>>>>   people? I think currently we have pretty in-depth docs on
>>> our
>>>>>>> apis
>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>   suspect a person trying to figure out how to implement a
>>>> simple
>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>   might get a bit lost trying to figure out how to wire it
>>> up. A
>>>>>>>>> simple
>>>>>>>>>>>>> five
>>>>>>>>>>>>>>   line example in the docs would probably help a lot. Not
>> sure
>>>> if
>>>>>>>>> this
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>   best addressed in this KIP or is a side comment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <
>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's
>>>> Streaming
>>>>>>>>> API.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please have a look here:
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to your feedback!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to