mertdotcc opened a new issue, #4173: URL: https://github.com/apache/camel-k/issues/4173
(I am aware that there are already discussions and existing issues regarding multi-tenancy and multi-operator configuration, and that these have been added to the roadmap for version 2 as well. I am opening this issue since I think it's slightly different than what's open out there. We can merge this in with an existing issue, or close it, I'd be fine with it. I just want to start a discussion and get different people's opinions.) Right now we have 8-10 different integrations we are deploying with a native image. We have a memory-optimized node with `16Gi` of memory and we use taints and annotations to make sure that only the operator runs in that node and no other pod. We first tried running our operator with `--operator-resources requests.memory=4096Mi` and `--operator-resources limits.memory=4096Mi` flags (this was before we created our operator-only memory-optimized node) as per the documentation [suggests](https://github.com/apache/camel-k/blob/6f0037aef8a87d56509f2168cb510cd08d564d46/resources/traits.yaml#L1237), but the native image build times were dreadfully slow for our needs. Then we created our memory-optimized node and deployed our operator with `--operator-resources requests.memory=8192Mi` and `--operator-resources limits.memory=12288Mi` flags. The current situation is that the operator uses basically all the memory we throw at it, in this case, all 12 gigs. We acknowledge that the native image build process is resource and time intensive and we can live with that fact. The problem starts when 2 developers who work on 2 different integrations deploy their changes and our ArgoCD picks those changes up and initiates new builds for those 2 integrations... I am not sure whether there is a queue mechanism or whether those 2 native image builds are happening at the same time (because the operator logs get scrambled at this moment) but what we would ideally like to see is the following: we have one operator running at all times (or even better, 2-3 operators running at the same time for HA purposes - when that is supported of course) but whenever there are 2-3 or more images that need to be built at any given time, a spot node instance is created, an operator is deployed there, the build runs and finishes there, the image gets pushed into our registry, and then the node instance gets killed. This would not only be super cost-effective (always makes it easier to sell to upper management) but also ensure that when there are 2 images that need to be built (or are being built) the wait time doesn't increase from ~10 minutes to 20+ minutes (as it currently happens with our use case). What are your thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
