I updated the KIP to reflect the options we have been discussing since Oct 2021 for people who didn't read the discussion thread.
On Wed, May 11, 2022 at 11:07 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote: > Hi Colin, > I don't mind the idea of MM2 users implementing the AdminClient interface. > However, there're two disadvantages to this. > > 1. Having around 70 methods definitions to have "NotImplemented" is > one downside, and keep up with these if the AdminClient interface changes. > 2. It makes it hard to list what admin functionality MM2 uses as MM2 > interactions with AdminClient in the codebase are in many places. > > I guess it's OK for MM2 users who want to build their admin client to > carry this burden, as I explained in my previous response to the discussion > thread. And we can do some cleanup to the codebase to have all Admin > interactions in MM2 in a utils class or something like that to make it > easier to navigate what MM2 needs from the Admin interface. > > Maybe I'm misunderstanding the use-case you're describing here. But it >> seems to me that if you create a proxy that has the ability to do any admin >> operation, and give MM2 access to that proxy, the security model is the >> same as just giving MM2 admin access. (Or it may be worse if the sysadmin >> doesn't know what this proxy is doing, and doesn't lock it down...) >> > > MM2 runs with the assumption that it has > > - "CREATE" ACLs for topics on the source clusters to create > `heartbeat` topics. > - "CREATE" and "ALTER" ACLs to create topics, add partitions, update > topics' config and topics' ACLs (in future, will also include group ACLS as > Mikael mentioned before in the thread) on the destination clusters. > > Most organisations have some resource management or federated solutions > (some would even have a budget system as part of these systems) to manage > Kafka resources, and these systems are usually the only application allowed > to initializing a client with "CREATE" and "ALTER" ACLs. They don't grant > these ACLs to any other teams/groups/applications to create such a client > outside these systems, so assuming MM2 can bypass these systems and use the > AdminClient directly to create/update resources isn't valid. This is the > primary concern here. > > The KIP is trying to give MM2 more flexibility to allow organisations to > integrate MM2 with their resource management system as they see fit without > forcing them to disable most MM2 features. > > Hope this make sense and clear it up. > > > On Wed, May 11, 2022 at 9:09 PM Colin McCabe <cmcc...@apache.org> wrote: > >> Hi Omnia Ibrahim, >> >> I'm sorry, but I am -1 on adding competing Admin interfaces. This would >> create confusion and a heavier maintenance burden for the project. >> >> Since the org.apache.kafka.clients.admin.Admin interface is a Java >> interface, any third-party software that wants to insert its own >> implementation of the interface can do so already. >> >> A KIP to make the Admin class used pluggable for MM2 would be reasonable. >> Adding a competing admin API is not. >> >> It's true that there are many Admin methods, but you do not need to >> implement all of them -- just the ones that MirrorMaker uses. The other >> ones can throw a NotImplementedException or similar. >> >> > The current approach also assumes that the user running MM2 has the >> Admin right to >> > create/update topics, which is only valid if the user who runs MM2 also >> manages both >> > source and destination clusters. >> >> Maybe I'm misunderstanding the use-case you're describing here. But it >> seems to me that if you create a proxy that has the ability to do any admin >> operation, and give MM2 access to that proxy, the security model is the >> same as just giving MM2 admin access. (Or it may be worse if the sysadmin >> doesn't know what this proxy is doing, and doesn't lock it down...) >> >> best, >> Colin >> >> >> On Mon, May 9, 2022, at 13:21, Omnia Ibrahim wrote: >> > Hi, I gave the KIP another look after talking to some people at the >> Kafka >> > Summit in London. And I would like to clear up the motivation of this >> KIP. >> > >> > >> > At the moment, MM2 has some opinionated decisions that are creating >> issues >> > for teams that use IaC, federated solutions or have a capacity/budget >> > planning system for Kafka destination clusters. To explain it better, >> let's >> > assume we have MM2 with the following configurations to highlight these >> > problems. >> > >> > ``` >> > >> > topics = .* >> > >> > refresh.topics.enabled = true >> > >> > sync.topic.configs.enabled = true >> > >> > sync.topic.acls.enabled = true >> > >> > // Maybe in futrue we can have sync.group.acls.enabled = true >> > >> > ``` >> > >> > >> > These configurations allow us to run MM2 with the value of its full >> > features. However, there are two main concerns when we run on a scale >> with >> > these configs: >> > >> > 1. *Capacity/Budgeting Planning:* >> > >> > Functionality or features that impact capacity planning using MM2 are: >> > >> > 1. MM2 automatically creates topics (breaking the rule of >> > `auto.create.topics.enable=false`) and creates topic partitions on >> > destination clusters if the number of partitions increases on the >> source. >> > In the previous example, this functionality will apply to any topic >> that >> > matches the regex of the `topics` config. >> > 2. Sync topic configs include configurations that impact capacity >> like ` >> > retention.ms` and `retention.bytes`. >> > >> > These 2 points lead to adding new untracked capacity to destination >> > clusters without a way to count for them up-front or safeguard the >> cluster. >> > The team that runs the cluster will only see the capacity issue when >> their >> > disk usage hits the threshold for their alerts. The desk capacity issue >> can >> > be avoided if MM2 is flexible enough to >> > >> > - have a way for teams that run their ecosystem to have MM2 behave >> > within their system. >> > - disable the auto-creation and avoid syncing configs that impact >> > capacity >> > >> > >> > 2. *Provisioning conflict:* >> > >> > In the previous MM2 configurations; we ended up with conflict as MM2 >> used >> > `AdminClient` directly to perform the following functionality >> > >> > - Create a Kafka topic (no way to disable this at the moment) >> > - Add new Kafka partitions (no way to disable this at the moment) >> > - Sync Kafka Topic configurations (can be disabled, but then this >> > reduces the value of MM2 potential for users) >> > - Sync Kafka topic's ACLs (can be disabled, but this reduces the >> users' >> > value). Disabling this feature also means that users must ensure >> they have >> > the right ACLs to the mirrored topics on the destination cluster >> before >> > switching their consumers, especially when MM2 is used for disaster >> > recovery. It may lead to extra downtime for them. >> > >> > >> > All these functionalities are using AdminClient; which causes an issue >> with >> > teams that >> > >> > - Manage their Kafka resources using tools like Strimizi or custom >> > federated solutions. For example, Strimizi's UserOperator doesn't >> sync the >> > topic ACLs when MM2 is enabled. Strimzi documentation mentions that >> users >> > must to disable MM2 `sync.topic.acls.enabled` if they use >> `UserOperator`. >> > On the other hand, Strimizi's TopicOperator doesn't have the same >> issue >> > because it has a bi-directional reconciliation process that watches >> the >> > topics state on the Kafka cluster and updates KafkaTopic resources >> for >> > Strimzi. This design works fine with Kafka MM2 for Topics but not for >> > syncing ACLs. Strimizi TopicOperator also doesn't have a way to stop >> > syncing config that impact capacity for example retention configs. >> > >> > >> > - Teams that run MM2 but don't own the destination cluster. In this >> > case, these teams don't have Admin access, but they may have Kafka >> > management solutions, such as yahoo/CMAK or an in-house solution. >> For such >> > a tool as CMAK, these teams can update/create resources using CMAK >> REST API. >> > >> > >> > The Proposed KIP gives users the flexibility to integrate MM2 within >> their >> > ecosystem without disabling any MM2 features. We can achieve this >> > flexibility with one of the following solutions. >> > >> > 1. Introduce a new interface that hides Admin interactions in one >> place. >> > Then users can provide their way of resource management. As well as >> clean >> > up the MM2 code by having one place that manages the resources, as >> at the >> > moment, MM2 usage of AdminClient is all over the code. >> > 2. The second solution could be to add only a new config that allows >> the >> > users to override AdminClient with another implementation of the >> > AdminClient interface, as Ryanne suggested before. The downside is >> that >> > AdminClient is enormous and constantly under development, so any >> users who >> > opt-in for custom implementation will need to carry this burden. >> > >> > I favour the first solution as it will make it either later to add any >> new >> > feature related to resource management. But don't mind if others think >> that >> > the second solution is easier for MM2 design. >> > >> > >> > *Note*: There are two possible future KIPs following this KIP to >> > >> > 1. Add config to disable MM2 from auto creating or adding new topic >> > partitions. >> > 2. Add a way to exclude a specific topic's configuration from being >> > synced. >> > >> > >> > I hope this clears up the problem better. Please let me know what do you >> > think. >> > >> > >> > On Mon, Feb 7, 2022 at 3:26 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> >> > wrote: >> > >> >> Hi Mickael. Thanks for the feedback. I address some of your points >> below. >> >> >> >> *> This seems to address a relatively advanced and specific use case* >> >> The main point of the KIP is that MM2 is making a massive assumption >> that >> >> it has the right to alter/create resources. This assumption isn't >> valid in >> >> the world of Infra-as-Code, federated solutions and popularity of OS >> Kafka >> >> Kubernetes operators; these infra/resource management solutions aren't >> >> advanced use-cases anymore nowadays. These concerns had been raised in >> the >> >> past, especially regarding the assumption that MM2 can create topics >> on the >> >> destination cluster. For example, >> >> https://issues.apache.org/jira/browse/KAFKA-12753 and >> >> https://www.mail-archive.com/dev@kafka.apache.org/msg119340.html. >> >> The primary motivation is giving some power to data >> >> platform/infrastructure team to make MM2 part of their internal Kafka >> >> ecosystem without dropping the main features that make MM2 valuable, >> like >> >> syncing topic configs. For example, if someone uses any OS Kafka k8s >> >> operator, they can implement the class to interact with the k8s >> operator to >> >> create these resources. >> >> >> >> *> My initial concern is this may make it hard to evolve MirrorMaker as >> >> we'll likely need to update this new interface if new features are >> added.* >> >> I agree it's a disadvantage to adding a new interface however adding >> more >> >> admin interactions from MM2 to alter/create resources and access will >> feed >> >> the main issue as I mentioned above with the popularity of IaC and >> >> federated solutions; most data platform/infrastructure teams will endup >> >> disabling these new features. >> >> Also, at the moment, most MM2 interactions with the admin client are >> >> scattered across the codebase so having one place where all admin >> >> interactions are listed isn't a bad thing. >> >> >> >> *> For example if we wanted to sync group ACLs.* >> >> As I mentioned before, altering any resource's configurations with MM2 >> is >> >> the one main concern for any data platform/infrastructure team that >> wants >> >> to have control over their clusters and use MM2. So the main question >> with >> >> adding any new altering feature like sync group ACLs will raise the >> same >> >> question of how many teams will actually enable this feature. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *>Regarding the proposed API, I have a few suggestions: >- What about >> >> using configure() instead of the constructor to pass the >> >configuration, >> >> especially as it's implementing Configurable >- It's not clear what >> all the >> >> arguments of createTopicPartitions()>are. What's the difference between >> >> partitionCounts and newPartitions?>Should we have separate methods for >> >> creating topics and partitions? >- Do we really need >> >> createCompactedTopic()? >- Instead of updateTopicConfigs() and >> updateAcls() >> >> should we use the >"alter" prefix to stay consistent with Admin?* >> >> >> >> These are good suggestions that will update the KIP to address these. >> >> Regarding the `createCompactedTopic` MM2 is using this method to create >> >> internal topics. >> >> >> >> Thanks >> >> >> >> On Wed, Jan 26, 2022 at 1:55 PM Mickael Maison < >> mickael.mai...@gmail.com> >> >> wrote: >> >> >> >>> Hi Omnia, >> >>> >> >>> Thanks for the KIP, sorry for taking so long to comment. I've only had >> >>> time to take a quick look so far. >> >>> >> >>> This seems to address a relatively advanced and specific use case. My >> >>> initial concern is this may make it hard to evolve MirrorMaker as >> >>> we'll likely need to update this new interface if new features are >> >>> added. For example if we wanted to sync group ACLs. >> >>> I'm wondering if it's something you've thought about. I'm not saying >> >>> it's a blocker but we have to weigh the pros and cons when introducing >> >>> new features. >> >>> >> >>> Regarding the proposed API, I have a few suggestions: >> >>> - What about using configure() instead of the constructor to pass the >> >>> configuration, especially as it's implementing Configurable >> >>> - It's not clear what all the arguments of createTopicPartitions() >> >>> are. What's the difference between partitionCounts and newPartitions? >> >>> Should we have separate methods for creating topics and partitions? >> >>> - Do we really need createCompactedTopic()? >> >>> - Instead of updateTopicConfigs() and updateAcls() should we use the >> >>> "alter" prefix to stay consistent with Admin? >> >>> >> >>> Thanks, >> >>> Mickael >> >>> >> >>> On Wed, Jan 26, 2022 at 11:26 AM Omnia Ibrahim < >> o.g.h.ibra...@gmail.com> >> >>> wrote: >> >>> > >> >>> > Hi, >> >>> > If there are no more concerns regarding the proposal can I get some >> >>> votes on the KIP here >> >>> https://lists.apache.org/thread/950lpxjb5kbjm8qdszlpxm9h4j4sfyjx >> >>> > >> >>> > Thanks >> >>> > >> >>> > On Wed, Oct 27, 2021 at 3:54 PM Ryanne Dolan <ryannedo...@gmail.com >> > >> >>> wrote: >> >>> >> >> >>> >> Well I'm convinced! Thanks for looking into it. >> >>> >> >> >>> >> Ryanne >> >>> >> >> >>> >> On Wed, Oct 27, 2021, 8:49 AM Omnia Ibrahim < >> o.g.h.ibra...@gmail.com> >> >>> wrote: >> >>> >> >> >>> >> > I checked the difference between the number of methods in the >> Admin >> >>> >> > interface and the number of methods MM2 invokes from Admin, and >> this >> >>> diff >> >>> >> > is enormous (it's more than 70 methods). >> >>> >> > As far as I can see, the following methods MM2 depends on (in >> >>> >> > MirrorSourceConnector, MirrorMaker, MirrorCheckpointTask and >> >>> >> > MirrorCheckpointConnector), this will leave 73 methods on the >> Admin >> >>> >> > interface that customer will need to return dummy data for, >> >>> >> > >> >>> >> > 1. create(conf) >> >>> >> > 2. close >> >>> >> > 3. listTopics() >> >>> >> > 4. createTopics(newTopics, createTopicsOptions) >> >>> >> > 5. createPartitions(newPartitions) >> >>> >> > 6. alterConfigs(configs) // this method is marked for >> >>> deprecation in >> >>> >> > Admin and the ConfigResource MM2 use is only TOPIC >> >>> >> > 7. createAcls(aclBindings) // the list of bindings always >> >>> filtered by >> >>> >> > TOPIC >> >>> >> > 8. describeAcls(aclBindingFilter) // filter is always >> >>> ANY_TOPIC_ACL >> >>> >> > 9. describeConfigs(configResources) // Always for TOPIC >> resources >> >>> >> > 10. listConsumerGroupOffsets(groupId) >> >>> >> > 11. listConsumerGroups() >> >>> >> > 12. alterConsumerGroupOffsets(groupId, offsets) >> >>> >> > 13. describeCluster() // this is invoked from >> >>> >> > ConnectUtils.lookupKafkaClusterId(conf), >> >>> >> > but MM2 isn't the one that initialize the AdminClient >> >>> >> > >> >>> >> > Going with the Admin interface in practice will make any custom >> Admin >> >>> >> > implementation humongous even for a fringe use case because of >> the >> >>> number >> >>> >> > of methods that need to return dummy data, >> >>> >> > >> >>> >> > I am still leaning toward a new interface as it abstract all >> MM2's >> >>> >> > interaction with Kafka Resources in one place; this makes it >> easier >> >>> to >> >>> >> > maintain and make it easier for the use cases where customers >> wish to >> >>> >> > provide a different method to handle resources. >> >>> >> > >> >>> >> > Omnia >> >>> >> > >> >>> >> > On Tue, Oct 26, 2021 at 4:10 PM Ryanne Dolan < >> ryannedo...@gmail.com> >> >>> >> > wrote: >> >>> >> > >> >>> >> > > I like the idea of failing-fast whenever a custom impl is >> >>> provided, but I >> >>> >> > > suppose that that could be done for Admin as well. I agree your >> >>> proposal >> >>> >> > is >> >>> >> > > more ergonomic, but maybe it's okay to have a little friction >> in >> >>> such >> >>> >> > > fringe use-cases. >> >>> >> > > >> >>> >> > > Ryanne >> >>> >> > > >> >>> >> > > >> >>> >> > > On Tue, Oct 26, 2021, 6:23 AM Omnia Ibrahim < >> >>> o.g.h.ibra...@gmail.com> >> >>> >> > > wrote: >> >>> >> > > >> >>> >> > > > Hey Ryanne, Thanks fo the quick feedback. >> >>> >> > > > Using the Admin interface would make everything easier, as >> MM2 >> >>> will >> >>> >> > need >> >>> >> > > > only to configure the classpath for the new implementation >> and >> >>> use it >> >>> >> > > > instead of AdminClient. >> >>> >> > > > However, I have two concerns >> >>> >> > > > 1. The Admin interface is enormous, and the MM2 users will >> need >> >>> to know >> >>> >> > > the >> >>> >> > > > list of methods MM2 depends on and override these only >> instead of >> >>> >> > > > implementing the whole Admin interface. >> >>> >> > > > 2. MM2 users will need keep an eye on any changes to Admin >> >>> interface >> >>> >> > that >> >>> >> > > > impact MM2 for example deprecating methods. >> >>> >> > > > Am not sure if adding these concerns on the users is >> acceptable >> >>> or not. >> >>> >> > > > One solution to address these concerns could be adding some >> >>> checks to >> >>> >> > > make >> >>> >> > > > sure the methods MM2 uses from the Admin interface exists to >> fail >> >>> >> > faster. >> >>> >> > > > What do you think >> >>> >> > > > >> >>> >> > > > Omnia >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Mon, Oct 25, 2021 at 11:24 PM Ryanne Dolan < >> >>> ryannedo...@gmail.com> >> >>> >> > > > wrote: >> >>> >> > > > >> >>> >> > > > > Thanks Omnia, neat idea. I wonder if we could use the >> existing >> >>> Admin >> >>> >> > > > > interface instead of defining a new one? >> >>> >> > > > > >> >>> >> > > > > Ryanne >> >>> >> > > > > >> >>> >> > > > > On Mon, Oct 25, 2021, 12:54 PM Omnia Ibrahim < >> >>> >> > o.g.h.ibra...@gmail.com> >> >>> >> > > > > wrote: >> >>> >> > > > > >> >>> >> > > > > > Hey everyone, >> >>> >> > > > > > Please take a look at KIP-787 >> >>> >> > > > > > >> >>> >> > > > > > >> >>> >> > > > > >> >>> >> > > > >> >>> >> > > >> >>> >> > >> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-787%3A+MM2+Interface+to+manage+Kafka+resources >> >>> >> > > > > > < >> >>> >> > > > > > >> >>> >> > > > > >> >>> >> > > > >> >>> >> > > >> >>> >> > >> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-787%3A+MM2+Interface+to+manage+Kafka+resources >> >>> >> > > > > > > >> >>> >> > > > > > >> >>> >> > > > > > Thanks for the feedback and support >> >>> >> > > > > > Omnia >> >>> >> > > > > > >> >>> >> > > > > >> >>> >> > > > >> >>> >> > > >> >>> >> > >> >>> >> >> >> >