[GitHub] [camel-k] mertdotcc opened a new issue, #4173: Horizontal scaling ability for the operators

via GitHub Mon, 27 Mar 2023 22:02:01 -0700


mertdotcc opened a new issue, #4173:
URL: https://github.com/apache/camel-k/issues/4173


   (I am aware that there are already discussions and existing issues regarding 
multi-tenancy and multi-operator configuration, and that these have been added 
to the roadmap for version 2 as well. I am opening this issue since I think 
it's slightly different than what's open out there. We can merge this in with 
an existing issue, or close it, I'd be fine with it. I just want to start a 
discussion and get different people's opinions.)
   
   Right now we have 8-10 different integrations we are deploying with a native 
image. We have a memory-optimized node with `16Gi` of memory and we use taints 
and annotations to make sure that only the operator runs in that node and no 
other pod.
   
   We first tried running our operator with `--operator-resources 
requests.memory=4096Mi` and `--operator-resources limits.memory=4096Mi` flags 
(this was before we created our operator-only memory-optimized node) as per the 
documentation 
[suggests](https://github.com/apache/camel-k/blob/6f0037aef8a87d56509f2168cb510cd08d564d46/resources/traits.yaml#L1237),
 but the native image build times were dreadfully slow for our needs.
   
   Then we created our memory-optimized node and deployed our operator with 
`--operator-resources requests.memory=8192Mi` and `--operator-resources 
limits.memory=12288Mi` flags.
   
   The current situation is that the operator uses basically all the memory we 
throw at it, in this case, all 12 gigs. We acknowledge that the native image 
build process is resource and time intensive and we can live with that fact. 
The problem starts when 2 developers who work on 2 different integrations 
deploy their changes and our ArgoCD picks those changes up and initiates new 
builds for those 2 integrations... I am not sure whether there is a queue 
mechanism or whether those 2 native image builds are happening at the same time 
(because the operator logs get scrambled at this moment) but what we would 
ideally like to see is the following: we have one operator running at all times 
(or even better, 2-3 operators running at the same time for HA purposes - when 
that is supported of course) but whenever there are 2-3 or more images that 
need to be built at any given time, a spot node instance is created, an 
operator is deployed there, the build runs and finishes there, the image gets 
 pushed into our registry, and then the node instance gets killed. This would 
not only be super cost-effective (always makes it easier to sell to upper 
management) but also ensure that when there are 2 images that need to be built 
(or are being built) the wait time doesn't increase from ~10 minutes to 20+ 
minutes (as it currently happens with our use case).
   
   What are your thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [camel-k] mertdotcc opened a new issue, #4173: Horizontal scaling ability for the operators

Reply via email to