Any thoughts on this KIP?

On Thu, Jun 17, 2021 at 1:38 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
wrote:

> Another reason why I think adding a configuration for each internal topic is 
> not a good solution is how MM2 is naming these topics at the moment.
> Right now MM2 sets the name of the offset-syncs topic to 
> mm2-offset-syncs.<cluster-alias>.internal and for checkpoints is 
> <cluster-alias>.checkpoints.internal so the name has a pattern to link it 
> back to the herder of source -> target mirror link so having this in 
> configuration will lead to
> 1. having a method that determines the final name of internal topics for 
> backward compatibility and have this method to be the default of the 
> configuration values. The challenge here is that we need to load first the 
> clusters alias to calculate the default value for offset-syncs.topic.name and 
> checkpoints.topic.name.
> 2. Consider use cases where MM2 is used to mirror between multiple clusters, 
> for example:
>         source1 -> target.enabled = true
>         source2 -> target.enabled = true
> For this use-case the current behaviour will create the following 
> offset-syncs and checkpoints on each cluster:
> source1 cluster
>         - mm2-offset-syncs.target.internal
> source2 cluster
>         - mm2-offset-syncs.target.internal
> target cluster
>         - source1.checkpoints.internal
>         - source2.checkpoints.internal
> As MM2 design in the original KIP-382 is spliting internal topics bsed on 
> mirroring links. Now if we let MM2 users set the full name of these topics as 
> configuration, how will we detect if the user has a wrong configuration where 
> they used the same name for checkpoints topic for both source1 and source2. 
> How this will work if both source1 and source2 clusters have consumer groups 
> with same ids as checkpoints topic messages contains consumer group id? 
> Should we warn the MM2 user that this topic has been used before for another 
> source cluster? If not how will the MM2 user notice that?
>
>
> On Mon, Jun 14, 2021 at 5:54 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> wrote:
>
>> Hi folks, let me try to clarify some of your concerns and questions.
>>
>> Mickael: Have you considered making names changeable via configurations?
>>>
>>
>> Gwen: may be missing something, but we are looking at 3 new configs (one
>>> for each topic). And this rejected alternative is basically identical to
>>> what Connect already does (you can choose names for internal topics using
>>> configs).
>>>
>>> These are valid points. The reasons why we should prefer an interface
>> (the current proposal is using the ReplicationPolicy interface which
>> already exists in MM2) instead are
>>
>> 1. the number of configurations that MM2 has. Right now MM2 has its own
>> set of configuration in addition to configuration for admin, consumer and
>> producer clients and Connect API. And these configurations in some
>> use-cases could be different based on the herder.
>>
>> Consider a use case where MM2 is used to mirror between a set of clusters
>> running by different teams and have different naming policies. So if we are
>> using 3 configurations for internal topics for a use case like below the
>> configuration will be like this. If the number of policies grows, the
>> amount of configuration can get unwieldy.
>>
>> clusters = newCenterCluster, teamACluster, teamBCluster, ...
>>
>> //newCenterCluster policy is <teamName>.<topicName>
>> //teamACluster naming policy is <app>_<topicName> when move to 
>> newCenterCluster it will be teamA.<app>_<topicName>
>> //teamBCluster naming policy is <domain>.<topicName> when move to 
>> newCenterCluster it will be teamB.<domain>_<topicName>
>>
>> //The goal is to move all topics from team-specific cluster to one new 
>> cluster
>> // where the org can unify resource management and naming conventions
>>
>> replication.policy.class=MyCustomReplicationPolicy
>>
>> teamACluster.heartbeat.topic=mm2_heartbeat_topic // created on source cluster
>> teamACluster->newCenterCluster.offsets-sync.topic=mm2_my_offset_sync_topic 
>> //created on source cluster at the moment
>> teamACluster->newCenterCluster.checkpoints.topic=teamA.mm2_checkpoint_topic 
>> //created on target cluster which newCenterCluster
>>
>> teamBCluster.heartbeat.topic=mm2.heartbeat_topic // created on source cluster
>> teamBCluster->newCenterCluster.offsets-sync.topic=mm2.my_offset_sync_topic 
>> //created on source cluster at the moment
>> teamBCluster->newCenterCluster.checkpoints.topic=teamB.mm2_checkpoint_topic 
>> //created on target cluster which newCenterCluster
>>
>> teamACluster.config.storage.topic=...
>> teamACluster.offset.storage.topic=...
>> teamACluster.status.storage.topic=...
>>
>> teamBCluster.config.storage.topic=...
>> teamBCluster.offset.storage.topic=...
>> teamBCluster.status.storage.topic=...
>>
>>
>> teamACluster→newCenterCluster.enabled=true
>> teamACluster→newCenterCluster.enabled=true
>>
>> 2. The other reason is what Mickael mentioned regards a future KIP to
>> move offset-syncs on the target cluster.
>>
>>> Mickael: I'm about to open a KIP to specify where the offset-syncs topic
>>> is created by MM2. In restricted environments, we'd prefer MM2 to only have
>>> read access to the source cluster and have the offset-syncs on the target
>>> cluster. I think allowing to specify the cluster where to create that topic
>>> would be a natural extension of the interface you propose here.
>>>
>>
>> In this case, where you want to achieve “read-only” on the source cluster
>> then using ReplicationPolicy to name the offset-syncs topic makes more
>> sense as ReplicationPolicy holds the implementation of how topics will be
>> named on the target cluster where MM2 will have “write” access.
>> For example, the user can provide their own naming convention for the
>> target cluster as part of replication.policy.class =
>> MyCustomReplicationPolicy where it formate the name of the replicated
>> topic to be <something>.<topic>.<something>.Now if we have also an extra
>> config for internal topics that will also be created at target with the
>> similar naming convention <something>.<internal_topic>.<something> this
>> means the user will need to add
>> offset-sync.topic=<something>.<offset_topic_name>.<something> this feels
>> like a duplication for me as we could achieve it by re-using the logic from
>>  ReplicationPolicy
>>
>> So using the replication interface we can define
>> MyCustomReplicationPolicy like this following
>>
>> public class *MyCustomReplicationPolicy* implements ReplicationPolicy {
>>     @Override
>>     //How to rename remote topics
>>     public String *formatRemoteTopic*(String sourceClusterAlias, String 
>> topic) {
>>         return nameTopicOnTarger(sourceClusterAlias, topic, "mirrored");
>>     }
>>
>>     @Override
>>     String *offsetSyncTopic*(String clusterAlias) {
>>         // offset-sync topic will be created on target cluster so it need to 
>> follow
>>         // naming convention of target cluster
>>         return nameTopicOnTarget(clusterAlias, "mm2-offset-sync", 
>> "internal");
>>     }
>>
>>     @Override
>>     String *checkpointsTopic*(String clusterAlias) {
>>         // checkpoints topic is created on target cluster so it need to 
>> follow
>>         // naming convention of target cluster
>>         return nameTopicOnTarget(clusterAlias, "mm2-checkpoints", 
>> "internal");
>>     }
>>
>>     private String *nameTopicOnTarget*(String prefix, String topic, String 
>> suffix) {
>>             return prefix + separator + topic + seprator + suffix;
>>     }
>> }
>>
>> and MM2 configs will be
>>
>> *newCenterCluster.replication.policy.class=MyCustomReplicationPolicy*
>>
>> Mickael: I understand an interface gives a lot of flexibility but I'd
>>> expect topic names to be relatively simple and known in advance in most
>>> cases.
>>>
>>
>> Gwen: How is it too complex?
>>>
>>
>> The reason I opted for this solution wasn't about complexity but rather
>> the number of configurations that MM2 already has. In addition to the point
>> above regarding Mickael's future KIP to move offset-sync topic target
>> cluster.
>>
>>
>> Gwen: There is a discussion about how this gets more complicated in some
>>> scenarios, but it was a bit abstract - is there a concrete case that shows
>>> why the configuration is more complicated than implementing a plugin?
>>>
>>
>> The use case I have in mind is something like my example above. Also
>> considering that Mickael is planning to open a KIP to restrict the access
>> to the source cluster and have the offset mapping topics on the target
>> cluster this means using the ReplicationPolicy that handle naming policies
>> at the target cluster makes it easier because if the target cluster has a
>> restricted naming convention it will be applied on all topics created there.
>>
>> Another consideration is a future KIP I’ll be raising soon to abstract
>> how MM2 creates topics on target cluster, say for example to integrate with
>> a centralized resource management system. A custom ReplicationPolicy will
>> give users more flexibility in this kind of integration and having to
>> manage what can potentially be a lot less configuration.
>>
>>
>> I’m not strongly attached to using an interface as the solution for
>> naming internal topics however, I feel it will give us more flexibility
>> with future KIPs.
>>
>> I would appreciate getting feedback from you both. And if the majority
>> feels like the configurations are easier I can change the proposal to this.
>>
>> --
>> Omnia
>>
>>
>> On Sat, May 15, 2021 at 5:59 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
>> wrote:
>>
>>> Hey Gwen,
>>> Thanks for having a look into the KIP, regard your question
>>>
>>>> is there a concrete case that shows why configuration is more
>>>> complicated than implementing a plugin?
>>>>
>>>
>>> The case I had in mind is where an organisation with medium-to-large
>>> size have multiple teams running their own Kafka clusters without a
>>> centralized team that runs Kafka and enforce the same rule for naming
>>> resources everywhere.
>>> In this environment, if the organization for example is trying to use
>>> MM2 to migrate all their data into one place or the data is needed on both
>>> clusters for any business use cases (like needs aggregate data from
>>> different clusters), adding extra config for each internal topic will
>>> increase the amount of configuration need to run MM2 for this case.
>>>
>>> The other concerns I have in general are
>>>
>>> 1. The popular usage of MM2 is replicating topics between clusters
>>> running by the same team, in this case, it's most likely that if this team
>>> has a naming topic's rule, this same rule will be applied to both
>>> replicated and some of the internal topics of MM2. If we added config
>>> like connect to each internal topic then these customers will end up adding
>>> 4 configs just to handle the same naming rule, 1 to include customised
>>> replication policy for naming replicated topics + extra 3 configs for the
>>> internals to match the same rule.
>>>
>>> 2. The replication policy plugin already implemented in the MM2, and
>>> many customers have already their own customised implementation (there're
>>> few implementations flying around already for this policy in the community)
>>> so we aren't adding extra configuration instead we are expanding the
>>> responsibility of the policy.
>>>
>>> Am happy to make it align with connect configs if others disagree with
>>> my concerns, at the end my number one goal is to make it flexible to name
>>> these topics. I opt-in for an interface based solution so I can minimize
>>> the number of config customers need to add.
>>>
>>> Omnia
>>>
>>>
>>> On Thu, May 13, 2021 at 9:30 PM Gwen Shapira <g...@confluent.io.invalid>
>>> wrote:
>>>
>>>> Hey, sorry for arriving late, but I have a question about the rejected
>>>> alternative involving configuration.
>>>>
>>>> I may be missing something, but we are looking at 3 new configs (one for
>>>> each topic). And this rejected alternative is basically identical to
>>>> what
>>>> Connect already does (you can choose names for internal topics using
>>>> configs). How is it too complex? As a user, configuring 3 fields seems
>>>> much
>>>> simpler than implementing a class. As an admin, trusting to run
>>>> someone's
>>>> code is scary, but a config with topic name is very safe and easy to
>>>> test
>>>> and trust.
>>>>
>>>> There is a discussion about how this gets more complicated in some
>>>> scenarios, but it was a bit abstract - is there a concrete case that
>>>> shows
>>>> why configuration is more complicated than implementing a plugin?
>>>>
>>>> Gwen
>>>>
>>>> On Fri, Apr 30, 2021 at 2:30 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Ryanne, Can you vote for it here
>>>> > https://www.mail-archive.com/dev@kafka.apache.org/msg113575.html as
>>>> well,
>>>> > please?
>>>> >
>>>> > On Fri, Apr 30, 2021 at 12:44 AM Ryanne Dolan <ryannedo...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Thanks Omnia. lgtm!
>>>> > >
>>>> > > Ryanne
>>>> > >
>>>> > > On Thu, Apr 29, 2021 at 10:50 AM Omnia Ibrahim <
>>>> o.g.h.ibra...@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > >> I updated the KIP
>>>> > >>
>>>> > >> On Thu, Apr 29, 2021 at 4:43 PM Omnia Ibrahim <
>>>> o.g.h.ibra...@gmail.com>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Sure, this would make it easier, we can make these functions
>>>> returns
>>>> > the
>>>> > >>> original behaviour (<clusterAlias>.checkpoints.internal,
>>>> > >>> "mm2-offset-syncs.<clusterAlias>.internal", heartbeat) without
>>>> any
>>>> > >>> customisation using `replication.policy.separator` and use the
>>>> > separator in
>>>> > >>> the DefaultReplicationPolicy
>>>> > >>>
>>>> > >>> On Wed, Apr 28, 2021 at 1:31 AM Ryanne Dolan <
>>>> ryannedo...@gmail.com>
>>>> > >>> wrote:
>>>> > >>>
>>>> > >>>> Thanks Omnia, makes sense to me.
>>>> > >>>>
>>>> > >>>> > Customers who have their customised ReplicationPolicy will
>>>> need to
>>>> > >>>> add the definition of their internal topics naming convention
>>>> > >>>>
>>>> > >>>> I wonder should we include default impls in the interface to
>>>> avoid
>>>> > that
>>>> > >>>> requirement?
>>>> > >>>>
>>>> > >>>> Ryanne
>>>> > >>>>
>>>> > >>>> On Sun, Apr 25, 2021, 2:20 PM Omnia Ibrahim <
>>>> o.g.h.ibra...@gmail.com>
>>>> > >>>> wrote:
>>>> > >>>>
>>>> > >>>>> Hi Mickael and Ryanne,
>>>> > >>>>> I updated the KIP to add these methods to the ReplicationPolicy
>>>> > >>>>> instead of an extra interface to simplify the changes. Please
>>>> have a
>>>> > look
>>>> > >>>>> and let me know your thoughts.
>>>> > >>>>>
>>>> > >>>>> Thanks
>>>> > >>>>>
>>>> > >>>>> On Tue, Mar 30, 2021 at 7:19 PM Omnia Ibrahim <
>>>> > o.g.h.ibra...@gmail.com>
>>>> > >>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> *(sorry forgot to Replay to All) *
>>>> > >>>>>> Hi Ryanne,
>>>> > >>>>>> It's a valid concern, I was trying to separate the concerns of
>>>> > >>>>>> internal and replicated policy away from each other and to
>>>> make the
>>>> > code
>>>> > >>>>>> readable as extending ReplicationPolicy to manage both
>>>> internal and
>>>> > >>>>>> replicated topic is a bit odd. Am not against simplifying
>>>> things
>>>> > out to
>>>> > >>>>>> make ReplicationPolicy handling both at the end of the day if
>>>> an
>>>> > MM2 user
>>>> > >>>>>> has a special naming convention for topics it will be
>>>> affecting both
>>>> > >>>>>> replicated and MM2 internal topics.
>>>> > >>>>>>
>>>> > >>>>>> For simplifying things we can extend `ReplicationPolicy` to the
>>>> > >>>>>> following instead of adding an extra class
>>>> > >>>>>>
>>>> > >>>>>>> *public interface ReplicationPolicy {*
>>>> > >>>>>>>     String topicSource(String topic);
>>>> > >>>>>>>     String upstreamTopic(String topic);
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> */** Returns heartbeats topic name.*/    String
>>>> heartbeatsTopic();*
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> *    /** Returns the offset-syncs topic for given cluster
>>>> alias. */
>>>> > >>>>>>>   String offsetSyncTopic(String targetAlias);    /** Returns
>>>> the
>>>> > name
>>>> > >>>>>>> checkpoint topic for given cluster alias. */    String
>>>> > >>>>>>> checkpointTopic(String sourceAlias); *
>>>> > >>>>>>>
>>>> > >>>>>>>     default String originalTopic(String topic) {
>>>> > >>>>>>>         String upstream = upstreamTopic(topic);
>>>> > >>>>>>>         if (upstream == null) {
>>>> > >>>>>>>             return topic;
>>>> > >>>>>>>         } else {
>>>> > >>>>>>>             return originalTopic(upstream);
>>>> > >>>>>>>         }
>>>> > >>>>>>>     }
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> *    /** Internal topics are never replicated. */
>>>> > >>>>>>> isInternalTopic(String topic) *//the implementaion will be
>>>> moved to
>>>> > >>>>>>> `DefaultReplicationPolicy` to handle both kafka topics and MM2
>>>> > internal
>>>> > >>>>>>> topics.
>>>> > >>>>>>> }
>>>> > >>>>>>>
>>>> > >>>>>>
>>>> > >>>>>> On Fri, Mar 26, 2021 at 3:05 PM Ryanne Dolan <
>>>> ryannedo...@gmail.com
>>>> > >
>>>> > >>>>>> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> Omnia, have we considered just adding methods to
>>>> ReplicationPolicy?
>>>> > >>>>>>> I'm reluctant to add a new class because, as Mickael points
>>>> out,
>>>> > we'd need
>>>> > >>>>>>> to carry it around in client code.
>>>> > >>>>>>>
>>>> > >>>>>>> Ryanne
>>>> > >>>>>>>
>>>> > >>>>>>> On Fri, Feb 19, 2021 at 8:31 AM Mickael Maison <
>>>> > >>>>>>> mickael.mai...@gmail.com> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>>> Hi Omnia,
>>>> > >>>>>>>>
>>>> > >>>>>>>> Thanks for the clarifications.
>>>> > >>>>>>>>
>>>> > >>>>>>>> - I'm still a bit uneasy with the overlap between these 2
>>>> methods
>>>> > as
>>>> > >>>>>>>> currently `ReplicationPolicy.isInternalTopic` already
>>>> handles MM2
>>>> > >>>>>>>> internal topics. Should we make it only handle Kafka internal
>>>> > topics
>>>> > >>>>>>>> and `isMM2InternalTopic()` only handle MM2 topics?
>>>> > >>>>>>>>
>>>> > >>>>>>>> - I'm not sure I understand what this method is used for.
>>>> There
>>>> > are
>>>> > >>>>>>>> no
>>>> > >>>>>>>> such methods for the other 2 topics (offset-sync and
>>>> heartbeat).
>>>> > >>>>>>>> Also
>>>> > >>>>>>>> what happens if there are other MM2 instances using different
>>>> > naming
>>>> > >>>>>>>> schemes in the same cluster. Do all instances have to know
>>>> about
>>>> > the
>>>> > >>>>>>>> other naming schemes? What are the expected issues if they
>>>> don't?
>>>> > >>>>>>>>
>>>> > >>>>>>>> - RemoteClusterUtils is a client-side utility so it does not
>>>> have
>>>> > >>>>>>>> access to the MM2 configuration. Since this new API can
>>>> affect the
>>>> > >>>>>>>> name of the checkpoint topic, it will need to be used
>>>> client-side
>>>> > >>>>>>>> too
>>>> > >>>>>>>> so users can find the checkpoint topic name. I had to
>>>> realized
>>>> > this
>>>> > >>>>>>>> was the case.
>>>> > >>>>>>>>
>>>> > >>>>>>>> Thanks
>>>> > >>>>>>>>
>>>> > >>>>>>>> On Mon, Feb 15, 2021 at 9:33 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com> wrote:
>>>> > >>>>>>>> >
>>>> > >>>>>>>> > Hi Mickael, did you have some time to check my answer?
>>>> > >>>>>>>> >
>>>> > >>>>>>>> > On Thu, Jan 21, 2021 at 10:10 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com> wrote:
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> Hi Mickael,
>>>> > >>>>>>>> >> Thanks for taking another look into the KIP, regards your
>>>> > >>>>>>>> questions
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> - I believe we need both "isMM2InternalTopic" and
>>>> > >>>>>>>> `ReplicationPolicy.isInternalTopic`  as
>>>> > `ReplicationPolicy.isInternalTopic`
>>>> > >>>>>>>> does check if a topic is Kafka internal topic, while
>>>> > `isMM2InternalTopic`
>>>> > >>>>>>>> is just focusing if a topic is MM2 internal topic or
>>>> not(which is
>>>> > >>>>>>>> heartbeat/checkpoint/offset-sync). The fact that the
>>>> default for
>>>> > MM2
>>>> > >>>>>>>> internal topics matches "ReplicationPolicy.isInternalTopic"
>>>> will
>>>> > not be an
>>>> > >>>>>>>> accurate assumption anymore once we implement this KIP.
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> - "isCheckpointTopic" will detect all checkpoint topics
>>>> for all
>>>> > >>>>>>>> MM2 instances this is needed for "MirrorClient.
>>>> checkpointTopics"
>>>> > which
>>>> > >>>>>>>> originally check if the topic name ends with
>>>> > CHECKPOINTS_TOPIC_SUFFIX. So
>>>> > >>>>>>>> this method just to keep the same functionality that
>>>> originally
>>>> > exists in
>>>> > >>>>>>>> MM2
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> - "checkpointTopic" is used in two places 1. At topic
>>>> creation
>>>> > >>>>>>>> in "MirrorCheckpointConnector.createInternalTopics" which
>>>> use
>>>> > >>>>>>>> "sourceClusterAlias() + CHECKPOINTS_TOPIC_SUFFIX" and 2. At
>>>> > >>>>>>>> "MirrorClient.remoteConsumerOffsets" which is called by
>>>> > >>>>>>>> "RemoteClusterUtils.translateOffsets"  the cluster alias
>>>> here
>>>> > referred to
>>>> > >>>>>>>> as "remoteCluster" where the topic name is
>>>> "remoteClusterAlias +
>>>> > >>>>>>>> CHECKPOINTS_TOPIC_SUFFIX"  (which is an argument in
>>>> > RemoteClusterUtils, not
>>>> > >>>>>>>> a config) This why I called the variable cluster instead of
>>>> > source and
>>>> > >>>>>>>> instead of using the config to figure out the cluster aliases
>>>> > from config
>>>> > >>>>>>>> as we use checkpoints to keep `RemoteClusterUtils`
>>>> compatible for
>>>> > existing
>>>> > >>>>>>>> users. I see a benefit of just read the config a find out the
>>>> > cluster
>>>> > >>>>>>>> aliases but on the other side, I'm not sure why
>>>> > "RemoteClusterUtils"
>>>> > >>>>>>>> doesn't get the name of the cluster from the properties
>>>> instead
>>>> > of an
>>>> > >>>>>>>> argument, so I decided to keep it just for compatibility.
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> Hope these answer some of your concerns.
>>>> > >>>>>>>> >> Best
>>>> > >>>>>>>> >> Omnia
>>>> > >>>>>>>> >>
>>>> > >>>>>>>> >> On Thu, Jan 21, 2021 at 3:37 PM Mickael Maison <
>>>> > >>>>>>>> mickael.mai...@gmail.com> wrote:
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> Hi Omnia,
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> Thanks for the updates. Sorry for the delay but I have a
>>>> few
>>>> > >>>>>>>> more
>>>> > >>>>>>>> >>> small questions about the API:
>>>> > >>>>>>>> >>> - Do we really need "isMM2InternalTopic()"? There's
>>>> already
>>>> > >>>>>>>> >>> "ReplicationPolicy.isInternalTopic()". If so, we need to
>>>> > >>>>>>>> explain the
>>>> > >>>>>>>> >>> difference between these 2 methods.
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> - Is "isCheckpointTopic()" expected to detect all
>>>> checkpoint
>>>> > >>>>>>>> topics
>>>> > >>>>>>>> >>> (for all MM2 instances) or only the ones for this
>>>> connector
>>>> > >>>>>>>> instance.
>>>> > >>>>>>>> >>> If it's the later, I wonder if we could do without the
>>>> method.
>>>> > >>>>>>>> As this
>>>> > >>>>>>>> >>> interface is only called by MM2, we could first call
>>>> > >>>>>>>> >>> "checkpointTopic()" and check if that's equal to the
>>>> topic
>>>> > we're
>>>> > >>>>>>>> >>> checking. If it's the former, we don't really know topic
>>>> names
>>>> > >>>>>>>> other
>>>> > >>>>>>>> >>> MM2 instances may be using!
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> - The 3 methods returning topic names have different
>>>> APIs:
>>>> > >>>>>>>> >>> "heartbeatsTopic()" takes no arguments,
>>>> "offsetSyncTopic()"
>>>> > >>>>>>>> takes the
>>>> > >>>>>>>> >>> target cluster alias and "checkpointTopic()" takes
>>>> > >>>>>>>> "clusterAlias"
>>>> > >>>>>>>> >>> (which one is it? source or target?). As the interface
>>>> extends
>>>> > >>>>>>>> >>> Configurable, maybe we could get rid of all the
>>>> arguments and
>>>> > >>>>>>>> use the
>>>> > >>>>>>>> >>> config to find the cluster aliases. WDYT?
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> These are minor concerns, just making sure I fully
>>>> understand
>>>> > >>>>>>>> how the
>>>> > >>>>>>>> >>> API is expected to be used. Once these are cleared, I'll
>>>> be
>>>> > >>>>>>>> happy to
>>>> > >>>>>>>> >>> vote for this KIP.
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> Thanks
>>>> > >>>>>>>> >>>
>>>> > >>>>>>>> >>> On Fri, Jan 8, 2021 at 12:06 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com> wrote:
>>>> > >>>>>>>> >>> >
>>>> > >>>>>>>> >>> > Hi Mickael,
>>>> > >>>>>>>> >>> > Did you get time to review the changes to the KIP? If
>>>> you
>>>> > >>>>>>>> okay with it could you vote for the KIP here ttps://
>>>> > >>>>>>>> www.mail-archive.com/dev@kafka.apache.org/msg113575.html?
>>>> > >>>>>>>> >>> > Thanks
>>>> > >>>>>>>> >>> >
>>>> > >>>>>>>> >>> > On Thu, Dec 10, 2020 at 2:19 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com> wrote:
>>>> > >>>>>>>> >>> >>
>>>> > >>>>>>>> >>> >> Hi Mickael,
>>>> > >>>>>>>> >>> >> 1) That's right the interface and default
>>>> implementation
>>>> > >>>>>>>> will in mirror-connect
>>>> > >>>>>>>> >>> >> 2) Renaming the interface should be fine too
>>>> especially if
>>>> > >>>>>>>> you planning to move other functionality related to the
>>>> creation
>>>> > there, I
>>>> > >>>>>>>> can edit this
>>>> > >>>>>>>> >>> >>
>>>> > >>>>>>>> >>> >> if you are okay with that please vote for the KIP here
>>>> > >>>>>>>>
>>>> https://www.mail-archive.com/dev@kafka.apache.org/msg113575.html
>>>> > >>>>>>>> >>> >>
>>>> > >>>>>>>> >>> >>
>>>> > >>>>>>>> >>> >> Thanks
>>>> > >>>>>>>> >>> >> Omnia
>>>> > >>>>>>>> >>> >> On Thu, Dec 10, 2020 at 12:58 PM Mickael Maison <
>>>> > >>>>>>>> mickael.mai...@gmail.com> wrote:
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> Hi Omnia,
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> Thank you for the reply, it makes sense.
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> A couple more comments:
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> 1) I'm assuming the new interface and default
>>>> > >>>>>>>> implementation will be
>>>> > >>>>>>>> >>> >>> in the mirror-client project? as the names of some of
>>>> > these
>>>> > >>>>>>>> topics are
>>>> > >>>>>>>> >>> >>> needed by RemoteClusterUtils on the client-side.
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> 2) I'm about to open a KIP to specify where the
>>>> > >>>>>>>> offset-syncs topic is
>>>> > >>>>>>>> >>> >>> created by MM2. In restricted environments, we'd
>>>> prefer
>>>> > MM2
>>>> > >>>>>>>> to only
>>>> > >>>>>>>> >>> >>> have read access to the source cluster and have the
>>>> > >>>>>>>> offset-syncs on
>>>> > >>>>>>>> >>> >>> the target cluster. I think allowing to specify the
>>>> > cluster
>>>> > >>>>>>>> where to
>>>> > >>>>>>>> >>> >>> create that topic would be a natural extension of the
>>>> > >>>>>>>> interface you
>>>> > >>>>>>>> >>> >>> propose here.
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> So I wonder if your interface could be named
>>>> > >>>>>>>> InternalTopicsPolicy?
>>>> > >>>>>>>> >>> >>> That's a bit more generic than
>>>> InternalTopicNamingPolicy.
>>>> > >>>>>>>> That would
>>>> > >>>>>>>> >>> >>> also match the configuration setting,
>>>> > >>>>>>>> internal.topics.policy.class,
>>>> > >>>>>>>> >>> >>> you're proposing.
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> Thanks
>>>> > >>>>>>>> >>> >>>
>>>> > >>>>>>>> >>> >>> On Thu, Dec 3, 2020 at 10:15 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com> wrote:
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > Hi Mickael,
>>>> > >>>>>>>> >>> >>> > Thanks for your feedback!
>>>> > >>>>>>>> >>> >>> > Regards your question about having more
>>>> configurations,
>>>> > I
>>>> > >>>>>>>> considered adding
>>>> > >>>>>>>> >>> >>> > configuration per each topic however this meant
>>>> adding
>>>> > >>>>>>>> more configurations
>>>> > >>>>>>>> >>> >>> > for MM2 which already have so many, also the more
>>>> > >>>>>>>> complicated and advanced
>>>> > >>>>>>>> >>> >>> > replication pattern you have between clusters the
>>>> more
>>>> > >>>>>>>> configuration lines
>>>> > >>>>>>>> >>> >>> > will be added to your MM2 config which isn't going
>>>> to be
>>>> > >>>>>>>> pretty if you
>>>> > >>>>>>>> >>> >>> > don't have the same topics names across your
>>>> clusters.
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > Also, it added more complexity to the
>>>> implementation as
>>>> > >>>>>>>> MM2 need to
>>>> > >>>>>>>> >>> >>> > 1- identify if a topic is checkpoints so we could
>>>> list
>>>> > >>>>>>>> the checkpoints
>>>> > >>>>>>>> >>> >>> > topics in MirrorMaker 2 utils as one cluster could
>>>> have
>>>> > X
>>>> > >>>>>>>> numbers
>>>> > >>>>>>>> >>> >>> > checkpoints topics if it's connected to X
>>>> clusters, this
>>>> > >>>>>>>> is done right now
>>>> > >>>>>>>> >>> >>> > by listing any topic with suffix
>>>> > `.checkpoints.internal`.
>>>> > >>>>>>>> This could be
>>>> > >>>>>>>> >>> >>> > done by add `checkpoints.topic.suffix` config but
>>>> this
>>>> > >>>>>>>> would make an
>>>> > >>>>>>>> >>> >>> > assumption that checkpoints will always have a
>>>> suffix
>>>> > >>>>>>>> also having a suffix
>>>> > >>>>>>>> >>> >>> > means that we may need a separator as well to
>>>> > concatenate
>>>> > >>>>>>>> this suffix with
>>>> > >>>>>>>> >>> >>> > a prefix to identify source cluster name.
>>>> > >>>>>>>> >>> >>> > 2- identify if a topic is internal, so it
>>>> shouldn't be
>>>> > >>>>>>>> replicated or track
>>>> > >>>>>>>> >>> >>> > checkpoints for it, right now this is relaying on
>>>> > >>>>>>>> disallow topics with
>>>> > >>>>>>>> >>> >>> > `.internal` suffix to be not replicated and not
>>>> tracked
>>>> > >>>>>>>> in checkpoints but
>>>> > >>>>>>>> >>> >>> > with making topics configurable we need a way to
>>>> define
>>>> > >>>>>>>> what is an internal
>>>> > >>>>>>>> >>> >>> > topic. This could be done by making using a list
>>>> of all
>>>> > >>>>>>>> internal topics
>>>> > >>>>>>>> >>> >>> > have been entered to the configuration.
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > So having an interface seemed easier and also give
>>>> more
>>>> > >>>>>>>> flexibility for
>>>> > >>>>>>>> >>> >>> > users to define their own topics name, define what
>>>> is
>>>> > >>>>>>>> internal topic means,
>>>> > >>>>>>>> >>> >>> > how to find checkpoints topics and it will be one
>>>> line
>>>> > >>>>>>>> config for each
>>>> > >>>>>>>> >>> >>> > herder, also it more consistence with MM2 code as
>>>> MM2
>>>> > >>>>>>>> config have
>>>> > >>>>>>>> >>> >>> > TopicFilter, ReplicationPolicy, GroupFilter, etc as
>>>> > >>>>>>>> interface and they can
>>>> > >>>>>>>> >>> >>> > be overridden by providing a custom implementation
>>>> for
>>>> > >>>>>>>> them or have some
>>>> > >>>>>>>> >>> >>> > config that change their default implementations.
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > Hope this answer your question. I also updated the
>>>> KIP
>>>> > to
>>>> > >>>>>>>> add this to the
>>>> > >>>>>>>> >>> >>> > rejected solutions.
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > On Thu, Dec 3, 2020 at 3:19 PM Mickael Maison <
>>>> > >>>>>>>> mickael.mai...@gmail.com>
>>>> > >>>>>>>> >>> >>> > wrote:
>>>> > >>>>>>>> >>> >>> >
>>>> > >>>>>>>> >>> >>> > > Hi Omnia,
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> >>> >>> > > Thanks for the KIP. Indeed being able to
>>>> configure
>>>> > >>>>>>>> MM2's internal
>>>> > >>>>>>>> >>> >>> > > topic names would be a nice improvement.
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> >>> >>> > > Looking at the KIP, I was surprised you propose
>>>> an
>>>> > >>>>>>>> interface to allow
>>>> > >>>>>>>> >>> >>> > > users to specify names. Have you considered
>>>> making
>>>> > >>>>>>>> names changeable
>>>> > >>>>>>>> >>> >>> > > via configurations? If so, we should definitely
>>>> > mention
>>>> > >>>>>>>> it in the
>>>> > >>>>>>>> >>> >>> > > rejected alternatives as it's the first method
>>>> that
>>>> > >>>>>>>> comes to mind.
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> >>> >>> > > I understand an interface gives a lot of
>>>> flexibility
>>>> > >>>>>>>> but I'd expect
>>>> > >>>>>>>> >>> >>> > > topic names to be relatively simple and known in
>>>> > >>>>>>>> advance in most
>>>> > >>>>>>>> >>> >>> > > cases.
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> >>> >>> > > I've not checked all use cases but something like
>>>> > below
>>>> > >>>>>>>> felt appropriate:
>>>> > >>>>>>>> >>> >>> > > clusters = primary,backup
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> primary->backup.offsets-sync.topic=backup.mytopic-offsets-
>>>> sync
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>> >>> >>> > > On Tue, Dec 1, 2020 at 3:36 PM Omnia Ibrahim <
>>>> > >>>>>>>> o.g.h.ibra...@gmail.com>
>>>> > >>>>>>>> >>> >>> > > wrote:
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > > > Hey everyone,
>>>> > >>>>>>>> >>> >>> > > > Please take a look at KIP-690:
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>>
>>>> >
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-690%3A+Add+additional+configuration+to+control+MirrorMaker+2+internal+topics+naming+convention
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > > > Thanks for your feedback and support.
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > > > Omnia
>>>> > >>>>>>>> >>> >>> > > >
>>>> > >>>>>>>> >>> >>> > >
>>>> > >>>>>>>>
>>>> > >>>>>>>
>>>> >
>>>>
>>>>
>>>> --
>>>> Gwen Shapira
>>>> Engineering Manager | Confluent
>>>> 650.450.2760 | @gwenshap
>>>> Follow us: Twitter | blog
>>>>
>>>

Reply via email to