kunwp1 opened a new issue, #5362: URL: https://github.com/apache/texera/issues/5362
### Task Summary During the dkNET-AI launch, we noticed that a computing unit keeps running when a user leaves the platform without terminating it. Because CUs are per-user compute pods, these idle CUs hold CPU/memory and pin their EKS nodes causing significant resource underutilization and cost. We need to (1) define what makes a CU "idle" and (2) add a mechanism that automatically terminates idle CUs. Based on @chenlica's and my investigation, Kubernetes has no built-in mechanism to terminate a pod for inactivity. It automatically stops pods for health/resource/lifecycle reasons (eviction, OOM, node failure, `activeDeadlineSeconds`), but never simply because a workload is idle. Related links: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ https://cloud.google.com/blog/products/containers-kubernetes/scale-to-zero-on-gke-with-keda https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal-pod-autoscale/ https://kubernetes.io/docs/concepts/workloads/controllers/job/ ### Task Type - [ ] Refactor / Cleanup - [ ] DevOps / Deployment / CI - [ ] Testing / QA - [ ] Documentation - [ ] Performance - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
