kunwp1 opened a new issue, #5362:
URL: https://github.com/apache/texera/issues/5362

   ### Task Summary
   
   During the dkNET-AI launch, we noticed that a computing unit keeps running 
when a user leaves the platform without terminating it. Because CUs are 
per-user compute pods, these idle CUs hold CPU/memory and pin their EKS nodes 
causing significant resource underutilization and cost. 
   
   We need to (1) define what makes a CU "idle" and (2) add a mechanism that 
automatically terminates idle CUs.
   
   Based on @chenlica's and my investigation, Kubernetes has no built-in 
mechanism to terminate a pod for inactivity. It automatically stops pods for 
health/resource/lifecycle reasons (eviction, OOM, node failure, 
`activeDeadlineSeconds`), but never simply because a workload is idle.
   
   Related links:
   https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
   
https://cloud.google.com/blog/products/containers-kubernetes/scale-to-zero-on-gke-with-keda
   
https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal-pod-autoscale/
   https://kubernetes.io/docs/concepts/workloads/controllers/job/
   
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to