[GitHub] [cloudstack] satishglondhe commented on issue #5976: Kubernetes Auto Scaling not working in version 4.16.0.0

GitBox Mon, 14 Mar 2022 00:10:05 -0700


satishglondhe commented on issue #5976:
URL: https://github.com/apache/cloudstack/issues/5976#issuecomment-1066448292



   Hi @davidjumani, Thank you for helping us through. I created a cluster with 
k8s versions 1.22.6 and 1.23.3, enabled autoscaling, changed the image to 
davidjumani/cloudstack-kubernetes-autoscaler:name-dev , deployed metrics 
server, and then created hpa, increased the load on a deployment, I was able to 
see the cluster scaled from node size 1 to 2. The pending pods were deployed on 
this new node. The logs are as follows.
   
   ```
   I0313 15:45:26.022450       1 cloudstack_manager.go:88] Got cluster : 
&{7c06e610-145c-4806-be2b-99f37a2b2b75 voyager 1 3 1 1 [0xc000d6d2f0 
0xc000d6d320]}
   W0313 15:45:26.022732       1 clusterstate.go:590] Failed to get nodegroup 
for 52b47c11-175a-441c-a5a4-96e83971f32a: Unable to find node 
52b47c11-175a-441c-a5a4-96e83971f32a in cluster
   W0313 15:45:26.022765       1 clusterstate.go:590] Failed to get nodegroup 
for 4cbcb62e-99ca-4d00-af22-e1eaae2ae691: Unable to find node 
4cbcb62e-99ca-4d00-af22-e1eaae2ae691 in cluster
   I0313 15:45:26.022794       1 static_autoscaler.go:341] 2 unregistered nodes 
present
   I0313 15:45:26.022849       1 filter_out_schedulable.go:65] Filtering out 
schedulables
   I0313 15:45:26.022867       1 filter_out_schedulable.go:137] Filtered out 0 
pods using hints
   I0313 15:45:26.022902       1 filter_out_schedulable.go:175] 0 pods were 
kept as unschedulable based on caching
   I0313 15:45:26.022925       1 filter_out_schedulable.go:176] 0 pods marked 
as unschedulable can be scheduled.
   I0313 15:45:26.023018       1 filter_out_schedulable.go:87] No schedulable 
pods
   I0313 15:45:26.023105       1 klogx.go:86] Pod 
default/php-apache-7656945b6b-nm8b6 is unschedulable
   I0313 15:45:26.023142       1 scale_up.go:376] Upcoming 0 nodes
   I0313 15:45:26.023296       1 scale_up.go:468] Best option to resize: 
7c06e610-145c-4806-be2b-99f37a2b2b75
   I0313 15:45:26.024052       1 scale_up.go:472] Estimated 1 nodes needed in 
7c06e610-145c-4806-be2b-99f37a2b2b75
   I0313 15:45:26.024215       1 scale_up.go:595] Final scale-up plan: 
[{7c06e610-145c-4806-be2b-99f37a2b2b75 1->2 (max: 3)}]
   I0313 15:45:26.024263       1 scale_up.go:691] Scale-up: setting group 
7c06e610-145c-4806-be2b-99f37a2b2b75 size to 2
   I0313 15:45:26.024890       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", 
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df", 
APIVersion:"v1", ResourceVersion:"8822", FieldPath:""}): type: 'Normal' reason: 
'ScaledUpGroup' Scale-up: setting group 7c06e610-145c-4806-be2b-99f37a2b2b75 
size to 2 instead of 1 (max: 3)
   I0313 15:45:26.028139       1 cloudstack_node_group.go:57] Increase Cluster 
: 7c06e610-145c-4806-be2b-99f37a2b2b75 by 1
   I0313 15:45:26.028330       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&response=json&size=2&signature=***
   I0313 15:45:26.056998       1 client.go:175] NewAPIRequest response status 
code:200
   I0313 15:45:36.057714       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
   I0313 15:45:36.076154       1 client.go:175] NewAPIRequest response status 
code:200
   I0313 15:45:36.078394       1 client.go:110] Still waiting for job 
eb1f348f-5a56-4543-b842-bd168da8192a to complete
   I0313 15:45:43.085178       1 reflector.go:536] 
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309:
 Watch close - *v1beta1.PodDisruptionBudget total 0 items received
   W0313 15:45:43.097076       1 warnings.go:70] policy/v1beta1 
PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use 
policy/v1 PodDisruptionBudget
   I0313 15:45:46.058218       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
   I0313 15:45:46.076218       1 client.go:175] NewAPIRequest response status 
code:200
   I0313 15:45:46.076674       1 client.go:110] Still waiting for job 
eb1f348f-5a56-4543-b842-bd168da8192a to complete
   I0313 15:45:49.284991       1 reflector.go:255] Listing and watching 
*v1beta1.CSIStorageCapacity from 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134)
   W0313 15:45:49.293329       1 reflector.go:324] 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
 failed to list *v1beta1.CSIStorageCapacity: 
[csistoragecapacities.storage.k8s.io](http://csistoragecapacities.storage.k8s.io/)
 is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" 
cannot list resource "csistoragecapacities" in API group 
"[storage.k8s.io](http://storage.k8s.io/)" at the cluster scope
   E0313 15:45:49.293574       1 reflector.go:138] 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
 Failed to watch *v1beta1.CSIStorageCapacity: failed to list 
*v1beta1.CSIStorageCapacity: 
[csistoragecapacities.storage.k8s.io](http://csistoragecapacities.storage.k8s.io/)
 is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" 
cannot list resource "csistoragecapacities" in API group 
"[storage.k8s.io](http://storage.k8s.io/)" at the cluster scope
   I0313 15:45:53.108193       1 reflector.go:536] 
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356:
 Watch close - *v1.StatefulSet total 0 items received
   I0313 15:45:56.058337       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
   I0313 15:45:56.077064       1 client.go:175] NewAPIRequest response status 
code:200
   ```
   
   However, when I removed the load, the autoscaler was able to detect that the 
new node is not needed based on cpu metrics, then it moved the pods on new node 
to the old node. When it came to deleting the new node, the operation failed. 
Following are the logs for same.
   
   ```
   I0313 16:10:42.751483       1 scale_down.go:418] Skipping 
voyager-control-17f83bf2095 from delete consideration - the node is marked as 
no scale down
   I0313 16:10:42.751496       1 scale_down.go:444] Node 
voyager-node-17f83bf431c is not suitable for removal - memory utilization too 
big (0.519572)
   I0313 16:10:42.751526       1 scale_down.go:448] Node 
voyager-node-17f83f4b273 - cpu utilization 0.150000
   I0313 16:10:42.751546       1 cluster.go:139] voyager-node-17f83f4b273 for 
removal
   I0313 16:10:42.751581       1 cluster.go:322] Pod 
default/php-apache-7656945b6b-nm8b6 can be moved to voyager-node-17f83bf431c
   I0313 16:10:42.751594       1 cluster.go:167] node voyager-node-17f83f4b273 
may be removed
   I0313 16:10:42.751646       1 static_autoscaler.go:523] 
voyager-node-17f83f4b273 is unneeded since 2022-03-13 16:00:39.61791118 +0000 
UTC m=+1238.053807358 duration 10m3.096154674s
   I0313 16:10:42.751677       1 static_autoscaler.go:534] Scale down status: 
unneededOnly=false lastScaleUpTime=2022-03-13 15:45:25.988144618 +0000 UTC 
m=+324.424040837 lastScaleDownDeleteTime=2022-03-13 14:40:24.439401321 +0000 
UTC m=-3577.124702365 lastScaleDownFailTime=2022-03-13 14:40:24.439401321 +0000 
UTC m=-3577.124702365 scaleDownForbidden=false isDeleteInProgress=false 
scaleDownInCooldown=false
   I0313 16:10:42.751693       1 static_autoscaler.go:547] Starting scale down
   I0313 16:10:42.751715       1 scale_down.go:828] voyager-node-17f83f4b273 
was unneeded for 10m3.096154674s
   I0313 16:10:42.751731       1 cluster.go:139] voyager-node-17f83f4b273 for 
removal
   I0313 16:10:42.751768       1 cluster.go:322] Pod 
default/php-apache-7656945b6b-nm8b6 can be moved to voyager-node-17f83bf431c
   I0313 16:10:42.751781       1 cluster.go:167] node voyager-node-17f83f4b273 
may be removed
   I0313 16:10:42.751804       1 scale_down.go:980] Scale-down: removing node 
voyager-node-17f83f4b273, utilization: {0.15 0.1082440988995888 0 cpu 0.15}, 
pods to reschedule: default/php-apache-7656945b6b-nm8b6
   I0313 16:10:42.752137       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", 
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df", 
APIVersion:"v1", ResourceVersion:"13552", FieldPath:""}): type: 'Normal' 
reason: 'ScaleDown' Scale-down: removing node voyager-node-17f83f4b273, 
utilization: {0.15 0.1082440988995888 0 cpu 0.15}, pods to reschedule: 
default/php-apache-7656945b6b-nm8b6
   I0313 16:10:42.763508       1 delete.go:103] Successfully added 
ToBeDeletedTaint on node voyager-node-17f83f4b273
   I0313 16:10:42.765309       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Node", Namespace:"", 
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0", 
APIVersion:"v1", ResourceVersion:"12899", FieldPath:""}): type: 'Normal' 
reason: 'ScaleDown' marked the node as toBeDeleted/unschedulable
   I0313 16:10:42.772269       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", 
Name:"weave-net-qjv74", UID:"526d282a-5c5e-41f9-af1c-5008730dfaae", 
APIVersion:"v1", ResourceVersion:"9683", FieldPath:""}): type: 'Normal' reason: 
'ScaleDown' deleting pod for node scale down
   I0313 16:10:42.776486       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", 
Name:"php-apache-7656945b6b-nm8b6", UID:"b68a74d7-374a-4367-9784-737e3bdd3f80", 
APIVersion:"v1", ResourceVersion:"9943", FieldPath:""}): type: 'Normal' reason: 
'ScaleDown' deleting pod for node scale down
   I0313 16:10:42.808539       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", 
Name:"kube-proxy-shdrg", UID:"33ad9fac-0cf5-495c-9147-062d874f3005", 
APIVersion:"v1", ResourceVersion:"9660", FieldPath:""}): type: 'Normal' reason: 
'ScaleDown' deleting pod for node scale down
   E0313 16:10:42.830691       1 scale_down.go:1349] Not deleted yet 
default/php-apache-7656945b6b-nm8b6
   I0313 16:10:45.116519       1 reflector.go:536] 
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356:
 Watch close - *v1.StatefulSet total 7 items received
   I0313 16:10:47.836224       1 scale_down.go:1360] All pods removed from 
voyager-node-17f83f4b273
   I0313 16:10:47.836668       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&nodeids=b23c7e87-5732-65e0-3d9c-0459c34c2115&response=json&signature=***
   I0313 16:10:47.849279       1 client.go:175] NewAPIRequest response status 
code:431
   I0313 16:10:47.851936       1 delete.go:197] Releasing taint 
{Key:ToBeDeletedByClusterAutoscaler Value:1647187842 Effect:NoSchedule 
TimeAdded:<nil>} on node voyager-node-17f83f4b273
   I0313 16:10:47.860208       1 delete.go:228] Successfully released 
ToBeDeletedTaint on node voyager-node-17f83f4b273
   E0313 16:10:47.860245       1 scale_down.go:1006] Failed to delete 
voyager-node-17f83f4b273: failed to delete voyager-node-17f83f4b273: Unable to 
delete [b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error 
code 9999) Unable to execute API command scalekubernetescluster due to invalid 
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due 
to incorrect long value format, or entity does not exist or due to incorrect 
parameter annotation for the field in api cmd class.
   I0313 16:10:47.860284       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Node", Namespace:"", 
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0", 
APIVersion:"v1", ResourceVersion:"12899", FieldPath:""}): type: 'Warning' 
reason: 'ScaleDownFailed' failed to delete the node
   I0313 16:10:50.050998       1 reflector.go:255] Listing and watching 
*v1.CSIDriver from 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134)
   W0313 16:10:50.055262       1 reflector.go:324] 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
 failed to list *v1.CSIDriver: 
[csidrivers.storage.k8s.io](http://csidrivers.storage.k8s.io/) is forbidden: 
User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list 
resource "csidrivers" in API group "[storage.k8s.io](http://storage.k8s.io/)" 
at the cluster scope
   E0313 16:10:50.055307       1 reflector.go:138] 
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
 Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: 
[csidrivers.storage.k8s.io](http://csidrivers.storage.k8s.io/) is forbidden: 
User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list 
resource "csidrivers" in API group "[storage.k8s.io](http://storage.k8s.io/)" 
at the cluster scope
   I0313 16:10:52.766096       1 static_autoscaler.go:235] Starting main loop
   I0313 16:10:52.766309       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=listKubernetesClusters&id=7c06e610-145c-4806-be2b-99f37a2b2b75&response=json&signature=***
   I0313 16:10:52.802153       1 client.go:175] NewAPIRequest response status 
code:200
   I0313 16:10:52.802615       1 cloudstack_manager.go:88] Got cluster : 
&{7c06e610-145c-4806-be2b-99f37a2b2b75 voyager 1 3 2 1 [0xc0010993b0 
0xc0010993e0 0xc001099410]}
   W0313 16:10:52.803058       1 clusterstate.go:590] Failed to get nodegroup 
for 4cbcb62e-99ca-4d00-af22-e1eaae2ae691: Unable to find node 
4cbcb62e-99ca-4d00-af22-e1eaae2ae691 in cluster
   W0313 16:10:52.803075       1 clusterstate.go:590] Failed to get nodegroup 
for 52b47c11-175a-441c-a5a4-96e83971f32a: Unable to find node 
52b47c11-175a-441c-a5a4-96e83971f32a in cluster
   W0313 16:10:52.803080       1 clusterstate.go:590] Failed to get nodegroup 
for cec92edd-2c62-450b-8af5-ced062c7a84d: Unable to find node 
cec92edd-2c62-450b-8af5-ced062c7a84d in cluster
   I0313 16:10:52.803099       1 static_autoscaler.go:341] 3 unregistered nodes 
present
   I0313 16:10:52.803108       1 static_autoscaler.go:624] Removing 
unregistered node cec92edd-2c62-450b-8af5-ced062c7a84d
   W0313 16:10:52.803114       1 static_autoscaler.go:627] Failed to get node 
group for cec92edd-2c62-450b-8af5-ced062c7a84d: Unable to find node 
cec92edd-2c62-450b-8af5-ced062c7a84d in cluster
   W0313 16:10:52.803120       1 static_autoscaler.go:346] Failed to remove 
unregistered nodes: Unable to find node cec92edd-2c62-450b-8af5-ced062c7a84d in 
cluster
   ```
   
   These are logs after 20 mins
   
   ```
   I0313 16:31:09.238844       1 static_autoscaler.go:547] Starting scale down
   I0313 16:31:09.238889       1 scale_down.go:828] voyager-node-17f83f4b273 
was unneeded for 10m3.256641915s
   I0313 16:31:09.238905       1 scale_down.go:1102] Scale-down: removing empty 
node voyager-node-17f83f4b273
   I0313 16:31:09.239138       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", 
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df", 
APIVersion:"v1", ResourceVersion:"17347", FieldPath:""}): type: 'Normal' 
reason: 'ScaleDownEmpty' Scale-down: removing empty node 
voyager-node-17f83f4b273
   I0313 16:31:09.255891       1 delete.go:103] Successfully added 
ToBeDeletedTaint on node voyager-node-17f83f4b273
   I0313 16:31:09.256402       1 client.go:169] NewAPIRequest API request 
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&nodeids=b23c7e87-5732-65e0-3d9c-0459c34c2115&response=json&signature=***
   I0313 16:31:09.267422       1 client.go:175] NewAPIRequest response status 
code:431
   E0313 16:31:09.269341       1 scale_down.go:1144] Problem with empty node 
deletion: failed to delete voyager-node-17f83f4b273: Unable to delete 
[b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error code 
9999) Unable to execute API command scalekubernetescluster due to invalid 
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due 
to incorrect long value format, or entity does not exist or due to incorrect 
parameter annotation for the field in api cmd class.
   I0313 16:31:09.272042       1 delete.go:197] Releasing taint 
{Key:ToBeDeletedByClusterAutoscaler Value:1647189069 Effect:NoSchedule 
TimeAdded:<nil>} on node voyager-node-17f83f4b273
   I0313 16:31:09.279999       1 delete.go:228] Successfully released 
ToBeDeletedTaint on node voyager-node-17f83f4b273
   I0313 16:31:09.306255       1 event_sink_logging_wrapper.go:48] 
Event(v1.ObjectReference{Kind:"Node", Namespace:"", 
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0", 
APIVersion:"v1", ResourceVersion:"16596", FieldPath:""}): type: 'Warning' 
reason: 'ScaleDownFailed' failed to delete empty node: failed to delete 
voyager-node-17f83f4b273: Unable to delete 
[b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error code 
9999) Unable to execute API command scalekubernetescluster due to invalid 
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due 
to incorrect long value format, or entity does not exist or due to incorrect 
parameter annotation for the field in api cmd class.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [cloudstack] satishglondhe commented on issue #5976: Kubernetes Auto Scaling not working in version 4.16.0.0

Reply via email to