Re: KIP-669: Preserve Source Partition in Kafka Streams from context

Balan k Thu, 17 Sep 2020 07:47:51 -0700

Thanks Matthias for the reply
I think i like the idea of the ability to use Record context in the default 
partitioner itself. I will join the discussion for KIP 478 to understand the 
context.


On 2020/09/11 21:31:46, "Matthias J. Sax" <mj...@apache.org> wrote: 
> With regard to KIP-478, there is the idea to introduce a `RecordContext`
> class.
> 
> Thus, we could just change the `StreamPartitioner` to take this new
> class as parameter to `partition()`? This might actually kill two birds
> with one stone, because I could imagine use cases in which users want to
> partition based on header information that is currently not exposed either.
> 
> For this case, we don't even need to provide any default implementation
> of `StreamPartitioner` but users can just implement it themselves. The
> use case itself makes sense, but it does not seem to be generic enough
> that we need to provide an out-of-the-box implementation for it.
> 
> 
> -Matthias
> 
> On 9/10/20 3:59 PM, Sophie Blee-Goldman wrote:
> > Hey Balan, thanks for the KIP!
> > 
> > The motivation here makes sense to me, but I have a few questions about the
> > proposed API
> > 
> > I guess the main thing to point out is that if we just add new addSink()
> > overloads to Topology,
> > then only the lower level Processor API will benefit and users of the DSL
> > won't be able to utilize
> > this. This seems like a useful feature that we should make available to
> > anyone.
> > 
> > We could follow a similar approach and add new toStream overloads to the
> > KStream class, but
> > that would expand the surface area of the API pretty significantly. The
> > additional addSink()
> > overloads alone would do this. The addSink() methods already have a pretty
> > large number
> > of optional parameters which means more and more overloads every time a new
> > one is added.
> > We should avoid making this problem worse wherever possible.
> > 
> > Existing StreamPartitioner  in SinkNode will be made null when context
> >> partition is enabled
> > 
> > 
> > This line from your KIP gave me some idea that it might be avoidable in
> > this case. The implication
> > of this quote is that the StreamPartitioner and useContextPartition
> > parameter are inherently
> > incompatible since they are two ways of specifying the same thing, the
> > target partition. Well, if
> > that's the case, then we should be able to just leverage the existing
> > StreamPartitioner in some
> > way to specify that we want records to end up in the source partition,
> > without introducing a new
> > parameter.
> > 
> > One option would be to just let users pass in a null StreamPartitioner to
> > mean that it should
> > use the source partition, but that seems a bit subtle. Maybe a better API
> > would be to offer
> > a new out-of-the-box StreamPartitioner called SourceContextPartitioner (or
> > something),
> > and then users just have to pass in an instance of this. WDYT?
> > 
> > On Thu, Sep 10, 2020 at 8:00 AM Balan k <ksbalan2...@gmail.com> wrote:
> > 
> >>
> >> Forgot to add the link
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-669%3A+Preserve+Source+Partition+in+Kafka+Streams+from+context
> >>
> >>
> >>
> >> On 2020/09/10 13:40:02, satyanarayan komandur <ksbalan2...@gmail.com>
> >> wrote:
> >>> Hi,
> >>>
> >>> I have submitted a new KIP for preserving processor record context
> >> partition from source. I am looking for suggestions/comments.
> >>>
> >>> In most use cases where source message is getting transformed and sent
> >> to a target topic, where
> >>> 1. number of partitions on source and sink topic are same
> >>> 2. there is no change to the key
> >>> 3. more importantly if we would like to preserve the partition as is
> >> without re-deriving using partition from context would help.
> >>>
> >>> I am aware of one caveat where record processor context partition is not
> >> known in stream punctuation.
> >>>
> >>> Please look over the KIP and chime in more ideas
> >>>
> >>> Thanks
> >>> Balan
> >>>
> >>>
> >>>
> >>
> > 
> 
>

Re: KIP-669: Preserve Source Partition in Kafka Streams from context

Reply via email to