Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Matthias J. Sax Mon, 13 Mar 2017 13:34:07 -0700

Guozhang,

1) I updated the KIP to option (b).


3) Agreed. So we leave this part out, and tackle it within KIP-130


-Matthias


On 3/12/17 3:48 PM, Guozhang Wang wrote:
> Thanks Matthias.
> 
> 1) Given that TopologyDescription is for debugging purposes before
> `KafkaStreams.start()`. I think the simplest option b) may be sufficient.
> Just needs to emphasize its possible value semantics in Java docs.
> 
> 3) You can tell that I was actually thinking about this together with
> KIP-130. To me if we can expose the runtime information, which is dynamic,
> via metrics in KIP-130 then we could remove this function. The main reason
> is that, again, the task migration make this function's behavior a bit
> difficult to explain. For example:
> 
> streams.start();
> 
> sleep(/* some time */)
> 
> streams.toString();
> 
> ---------------------------
> 
> Even with the same configuration, depending on for how long did you wait
> after started, the function could return very different string results due
> to rebalances.
> 
> That being said, I was not trying to make the decision in this KIP, as I
> saw it more related to KIP-130. So we could probably still keep it as is in
> KIP-120, and consider removing it in KIP-130. That's why I was just "asking
> your thoughts on this", but not necessary wanting to make an action in this
> KIP.
> 
> 
> Guozhang
> 
> 
> 
> On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Thanks for your feedback Guozhang.
>>
>>
>> 1) There are multiple ways to do this. Let me know what you think about
>> all options:
>>
>> (a) I updated the KIP to this:
>>
>>>     public final class Source implements Node {
>>>         public final String name;
>>>         // topicNames and topicPattern are mutually exclusive, i.e.,
>> only one will be not-null
>>>         public final List<String> topicNames; // null if #addSource(...,
>> Pattern) was used
>>>         public final Pattern topicPattern; // null if #addSource(...,
>> String...) was used
>>>     }
>>
>> (b) We could also go with a single variable (as originally proposed).
>> This would have the advantage (compared to (a)), that null checks are
>> not required accessing TopologyDescription#Source class.
>>
>>> String topics; // can be comma separated list of topic names or pattern
>> (as String)
>>
>> However, with an encoded list or an encoded pattern it's required to
>> parse the string again, what we want to avoid in the first place.
>>
>> (c) Use a single variable as in (b)
>>
>>> String topics; // always a pattern (as String)
>>
>> We translate a list of topic names into a pattern
>> "topic1|topic2|topic3". We loose the information if the source was added
>> via list or via pattern.
>>
>>
>>
>> 2) Your understanding is correct. Added a comment to the KIP.
>>
>>
>>
>> 3) I would keep KafkaStreams#toString() -- it's conceptually two
>> different things and runtime information is useful, too. But as its
>> return value is ambiguous to parse (and must be parsed in the first
>> place what is cumbersome), we could add KafkaStreams#describe() as
>>
>>> public synchronized KafkaStreamsDescription describe();
>>
>> KafkaStreamsDescription class would be similar to TopologyDescription to
>> allow programmatic access to runtime information. I guess we could even
>> reuse (parts of) TopologyDescription within KafkaStreamsDescription to
>> avoid code duplication.
>>
>> If you think this would be useful, I can extend the KIP accordingly.
>>
>>
>>
>> -Matthias
>>
>>
>>
>>
>> On 3/10/17 1:38 PM, Guozhang Wang wrote:
>>> One more question here:
>>>
>>> 3. with TopologyDescription, do we still want to keep the
>>> `KafkaStream.toString()` function? I think it may still have some
>> advantage
>>> such that it contains tasks information after `KafkaStream#start()` has
>>> been called, but much of it is duplicate with the TopologyDescription,
>> and
>>> it is only in the form of the string hence hard to programmatically
>>> leverage. So would like to hear your thoughts.
>>>
>>> Guozhang
>>>
>>> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>>>
>>>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor
>> comments:
>>>>
>>>> 1. With regard to this class:
>>>>
>>>>     public final class Source implements Node {
>>>>         public final String name;
>>>>         public final String topic; // can be topic name or pattern (as
>>>> String)
>>>>     }
>>>>
>>>> Note that the source node could contain more than a single topic, i.e. a
>>>> list of topics besides a pattern.
>>>>
>>>> 2. With regard to
>>>>
>>>>           public synchronized TopologyDescription describe();
>>>>
>>>> My understand is that whenever the topology is modified, one needs to
>> call
>>>> this function again to get a new description object, as the old one
>> won't
>>>> be updated automatically. Hence the usage pattern would be:
>>>>
>>>> TopologyDescription description = topology.describe();
>>>>
>>>> topology.addProcessor(...)
>>>>
>>>> description = topology.describe(); // have to call again
>>>>
>>>> -----------
>>>>
>>>> Is that right? If yes could you clarify this in the wiki?
>>>>
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mich...@confluent.io>
>> wrote:
>>>>
>>>>> Thanks for the update, Matthias.
>>>>>
>>>>> +1 to the points 1,2,3,4 you mentioned.
>>>>>
>>>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>>>> aspect
>>>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>>>> KStreamBuilder actually builds a topology (though currently the latter
>> is
>>>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <
>> matth...@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> sorry for not replying earlier and thanks for all your feedback. After
>>>>>> some more discussions I updated the KIP. The new proposal puts some
>>>>>> other design considerations into account, that I want to highlight
>>>>>> shortly. Those considerations, automatically resolve the concerns
>>>>> raised.
>>>>>>
>>>>>> First some answers:
>>>>>>
>>>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>>>> KTable
>>>>>>> internals.  I wouldn't be able to convert them to
>>>>> process()/transform().
>>>>>>>
>>>>>>> What's the harm in permitting both APIs to be used in the same
>>>>>> application?
>>>>>>
>>>>>> It's not about "harm" but about design. We want to switch from a
>>>>>> "inheritance" to a "composition" pattern.
>>>>>>
>>>>>> About the interface idea: using a shared interface would not help to
>> get
>>>>>> a composition pattern
>>>>>>
>>>>>>
>>>>>> Next I want to give the design considerations leading to the updated
>>>>> KIP:
>>>>>>
>>>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is
>> unnatural.
>>>>>> KafkaStreams client executes a `Topology` and this execution should be
>>>>>> independent of the way the topology is "put together", ie, low-level
>> API
>>>>>> or DSL.
>>>>>>
>>>>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
>>>>>>
>>>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns
>> a
>>>>>> `Topology` that can be passed into KafakStreams.
>>>>>>
>>>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>>>>>> rename the new class to `StreamsTopologyBuilder` (the name
>>>>>> TopologyBuilder would actually be more natural, but would be easily
>>>>>> confused with old low-level API TopologyBuilder).
>>>>>>
>>>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
>>>>>> StreamsTopologyBuilder return the created Topology via #build().
>>>>>>
>>>>>> I also removed `final` for both builder classes.
>>>>>>
>>>>>>
>>>>>>
>>>>>> With regard to the larger scope of the overal API redesign, I also
>> want
>>>>>> to point to a summary of API issues:
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>> Kafka+Streams+Discussions
>>>>>>
>>>>>> Thus, this KIP is only one building block of a larger improvement
>>>>>> effort, and we hope to get as much as possible done for 0.11. If you
>>>>>> have any API improvement ideas, please share them so we can come up
>> with
>>>>>> an holistic sound design (instead of uncoordinated local improvements
>>>>>> that might diverge)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking forward to your feedback on this KIP and the other API issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>>>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
>>>>> because
>>>>>>>> we think #transform() should provide all functionality you need to
>>>>>>>> mix-an-match Processor API and DSL. If there is any further concern
>>>>>>>> about this, please let us know.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>>>>>> concerns
>>>>>>> about this.  You're correct to point out that transform() can be used
>>>>> for
>>>>>>> some of the output situations I pointed out; albeit it seems somewhat
>>>>>>> awkward to do so in a "transform" method; what do you do with the
>>>>> retval?
>>>>>>>
>>>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>>>> KTable
>>>>>>> internals.  I wouldn't be able to convert them to
>>>>> process()/transform().
>>>>>>>
>>>>>>> What's the harm in permitting both APIs to be used in the same
>>>>>> application?
>>>>>>>
>>>>>>> Mathieu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>>
>>>
>>
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to