Re: [Kafka MirrorMaker] Message with Custom Partition Logic

Neha Narkhede Wed, 13 Aug 2014 09:44:38 -0700

Bhavesh,

I'd rephrase that a little bit. The new producer absolutely does allow the
user to use any partitioning strategy. However, the mirror maker currently
does not expose that functionality and uses only hash based partitioning.


It will be helpful to understand the specific use case for allowing
pluggable partitioning strategies for mirror maker though. Could you
elaborate on that requirement?

Thanks,
Neha


On Tue, Aug 12, 2014 at 9:29 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> With new producer it will still do the hash based partitioning based on
> keys if the messages have keys. However it is a bit harder to customize
> partitioning logic as the new producer do not expose the partitioner any
> more.
>
> Guozhang
>
>
> On Mon, Aug 11, 2014 at 11:12 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > wrote:
>
> > Hi Neha and Guozhang,
> >
> > As long as stickiness is maintain consistently to a particular partition
> in
> > target DC that is great so we can do per DC and across DC aggregation.
> >
> > How about non hash based instead of range based partitioning ?  eg  Key
> > start with "a" then send message to partition 1 to 10, if key starts
> with b
> > then partition 11 to 20 and so on & so forth...
> >
> > Is this case how does MM handle copying data ?  This is just for FYI for
> > now we are in process of upgrading to new producer then how will
> > MM distribute data to target DC if partition number are different etc ?
> >  Basically, how can I inject MM with my custom partitioning logic ?
> >
> > Thanks for your help !!
> >
> > Thanks,
> >
> > Bhavesh
> >
> >
> > On Mon, Aug 11, 2014 at 10:20 PM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> >
> > > Bhavesh,
> > >
> > > As Neha said, with more partitions on the destination brokers, events
> > that
> > > are belong to the same partition in the source cluster may be
> distributed
> > > to different partitions in the destination cluster.
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Aug 11, 2014 at 9:35 PM, Neha Narkhede <
> neha.narkh...@gmail.com>
> > > wrote:
> > >
> > > > Bhavesh,
> > > >
> > > > For keyed data, the mirror maker will just distribute data based on
> > > > hash(key)%num_partitions. If num_partitions is different in the
> target
> > DC
> > > > (which it is), a message that lived in partition 0 in the source
> > cluster
> > > > might end up in partition 10 in the target cluster.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Mon, Aug 11, 2014 at 7:23 PM, Bhavesh Mistry <
> > > > mistry.p.bhav...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > We are using Kafka 0.8.1 for all producer consumer and MM.
> > > > >
> > > > > We have 32 partition in source (local) per DC and we have 100 in
> > target
> > > > > (Central)  DC.
> > > > >
> > > > > Is there any configuration on MM for this etc ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Bhavesh
> > > > >
> > > > >
> > > > > On Mon, Aug 11, 2014 at 4:33 PM, Guozhang Wang <wangg...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Bhavesh,
> > > > > >
> > > > > > What is the number of partitions on the source and target
> clusters,
> > > and
> > > > > > what version of Kafka MM are you using?
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 11, 2014 at 1:21 PM, Bhavesh Mistry <
> > > > > > mistry.p.bhav...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > HI Kafka Dev Team,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > We have to aggregate events (count) per DC and across DCs for
> one
> > > of
> > > > > > topic.
> > > > > > > We have standard Linked-in data pipe line producers --> Local
> > > Brokers
> > > > > -->
> > > > > > > MM -->  Center Brokers.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > So I would like to know How MM handles messages when custom
> > > > > partitioning
> > > > > > > logic is used as below and number of partition in target DC is
> > SAME
> > > > vs
> > > > > > >  different
> > > > > > > than the source DC  ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > If we have key based messages and custom partitioning logic (
> > > > hash(key)
> > > > > >  %
> > > > > > > number of partition per topic source topic)  we want to count
> > event
> > > > > > >  similar
> > > > > > > event by hashing to same partition and count events, and but
> when
> > > > same
> > > > > > > event is MM to target DC will it go to same partition even
> though
> > > > > number
> > > > > > of
> > > > > > > partition is different in target DC  (meaning does MM will use
> > > > hash(key
> > > > > > > message) % number of partition) ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > According to this reference, I do not have way to configure
> this
> > or
> > > > to
> > > > > > > control which partitioning logic to use when MM data ?
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Bhavesh
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [Kafka MirrorMaker] Message with Custom Partition Logic

Reply via email to