satishglondhe commented on issue #5976:
URL: https://github.com/apache/cloudstack/issues/5976#issuecomment-1066448292
Hi @davidjumani, Thank you for helping us through. I created a cluster with
k8s versions 1.22.6 and 1.23.3, enabled autoscaling, changed the image to
davidjumani/cloudstack-kubernetes-autoscaler:name-dev , deployed metrics
server, and then created hpa, increased the load on a deployment, I was able to
see the cluster scaled from node size 1 to 2. The pending pods were deployed on
this new node. The logs are as follows.
```
I0313 15:45:26.022450 1 cloudstack_manager.go:88] Got cluster :
&{7c06e610-145c-4806-be2b-99f37a2b2b75 voyager 1 3 1 1 [0xc000d6d2f0
0xc000d6d320]}
W0313 15:45:26.022732 1 clusterstate.go:590] Failed to get nodegroup
for 52b47c11-175a-441c-a5a4-96e83971f32a: Unable to find node
52b47c11-175a-441c-a5a4-96e83971f32a in cluster
W0313 15:45:26.022765 1 clusterstate.go:590] Failed to get nodegroup
for 4cbcb62e-99ca-4d00-af22-e1eaae2ae691: Unable to find node
4cbcb62e-99ca-4d00-af22-e1eaae2ae691 in cluster
I0313 15:45:26.022794 1 static_autoscaler.go:341] 2 unregistered nodes
present
I0313 15:45:26.022849 1 filter_out_schedulable.go:65] Filtering out
schedulables
I0313 15:45:26.022867 1 filter_out_schedulable.go:137] Filtered out 0
pods using hints
I0313 15:45:26.022902 1 filter_out_schedulable.go:175] 0 pods were
kept as unschedulable based on caching
I0313 15:45:26.022925 1 filter_out_schedulable.go:176] 0 pods marked
as unschedulable can be scheduled.
I0313 15:45:26.023018 1 filter_out_schedulable.go:87] No schedulable
pods
I0313 15:45:26.023105 1 klogx.go:86] Pod
default/php-apache-7656945b6b-nm8b6 is unschedulable
I0313 15:45:26.023142 1 scale_up.go:376] Upcoming 0 nodes
I0313 15:45:26.023296 1 scale_up.go:468] Best option to resize:
7c06e610-145c-4806-be2b-99f37a2b2b75
I0313 15:45:26.024052 1 scale_up.go:472] Estimated 1 nodes needed in
7c06e610-145c-4806-be2b-99f37a2b2b75
I0313 15:45:26.024215 1 scale_up.go:595] Final scale-up plan:
[{7c06e610-145c-4806-be2b-99f37a2b2b75 1->2 (max: 3)}]
I0313 15:45:26.024263 1 scale_up.go:691] Scale-up: setting group
7c06e610-145c-4806-be2b-99f37a2b2b75 size to 2
I0313 15:45:26.024890 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system",
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df",
APIVersion:"v1", ResourceVersion:"8822", FieldPath:""}): type: 'Normal' reason:
'ScaledUpGroup' Scale-up: setting group 7c06e610-145c-4806-be2b-99f37a2b2b75
size to 2 instead of 1 (max: 3)
I0313 15:45:26.028139 1 cloudstack_node_group.go:57] Increase Cluster
: 7c06e610-145c-4806-be2b-99f37a2b2b75 by 1
I0313 15:45:26.028330 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&response=json&size=2&signature=***
I0313 15:45:26.056998 1 client.go:175] NewAPIRequest response status
code:200
I0313 15:45:36.057714 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
I0313 15:45:36.076154 1 client.go:175] NewAPIRequest response status
code:200
I0313 15:45:36.078394 1 client.go:110] Still waiting for job
eb1f348f-5a56-4543-b842-bd168da8192a to complete
I0313 15:45:43.085178 1 reflector.go:536]
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309:
Watch close - *v1beta1.PodDisruptionBudget total 0 items received
W0313 15:45:43.097076 1 warnings.go:70] policy/v1beta1
PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use
policy/v1 PodDisruptionBudget
I0313 15:45:46.058218 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
I0313 15:45:46.076218 1 client.go:175] NewAPIRequest response status
code:200
I0313 15:45:46.076674 1 client.go:110] Still waiting for job
eb1f348f-5a56-4543-b842-bd168da8192a to complete
I0313 15:45:49.284991 1 reflector.go:255] Listing and watching
*v1beta1.CSIStorageCapacity from
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134)
W0313 15:45:49.293329 1 reflector.go:324]
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
failed to list *v1beta1.CSIStorageCapacity:
[csistoragecapacities.storage.k8s.io](http://csistoragecapacities.storage.k8s.io/)
is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler"
cannot list resource "csistoragecapacities" in API group
"[storage.k8s.io](http://storage.k8s.io/)" at the cluster scope
E0313 15:45:49.293574 1 reflector.go:138]
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
Failed to watch *v1beta1.CSIStorageCapacity: failed to list
*v1beta1.CSIStorageCapacity:
[csistoragecapacities.storage.k8s.io](http://csistoragecapacities.storage.k8s.io/)
is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler"
cannot list resource "csistoragecapacities" in API group
"[storage.k8s.io](http://storage.k8s.io/)" at the cluster scope
I0313 15:45:53.108193 1 reflector.go:536]
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356:
Watch close - *v1.StatefulSet total 0 items received
I0313 15:45:56.058337 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=eb1f348f-5a56-4543-b842-bd168da8192a&response=json&signature=***
I0313 15:45:56.077064 1 client.go:175] NewAPIRequest response status
code:200
```
However, when I removed the load, the autoscaler was able to detect that the
new node is not needed based on cpu metrics, then it moved the pods on new node
to the old node. When it came to deleting the new node, the operation failed.
Following are the logs for same.
```
I0313 16:10:42.751483 1 scale_down.go:418] Skipping
voyager-control-17f83bf2095 from delete consideration - the node is marked as
no scale down
I0313 16:10:42.751496 1 scale_down.go:444] Node
voyager-node-17f83bf431c is not suitable for removal - memory utilization too
big (0.519572)
I0313 16:10:42.751526 1 scale_down.go:448] Node
voyager-node-17f83f4b273 - cpu utilization 0.150000
I0313 16:10:42.751546 1 cluster.go:139] voyager-node-17f83f4b273 for
removal
I0313 16:10:42.751581 1 cluster.go:322] Pod
default/php-apache-7656945b6b-nm8b6 can be moved to voyager-node-17f83bf431c
I0313 16:10:42.751594 1 cluster.go:167] node voyager-node-17f83f4b273
may be removed
I0313 16:10:42.751646 1 static_autoscaler.go:523]
voyager-node-17f83f4b273 is unneeded since 2022-03-13 16:00:39.61791118 +0000
UTC m=+1238.053807358 duration 10m3.096154674s
I0313 16:10:42.751677 1 static_autoscaler.go:534] Scale down status:
unneededOnly=false lastScaleUpTime=2022-03-13 15:45:25.988144618 +0000 UTC
m=+324.424040837 lastScaleDownDeleteTime=2022-03-13 14:40:24.439401321 +0000
UTC m=-3577.124702365 lastScaleDownFailTime=2022-03-13 14:40:24.439401321 +0000
UTC m=-3577.124702365 scaleDownForbidden=false isDeleteInProgress=false
scaleDownInCooldown=false
I0313 16:10:42.751693 1 static_autoscaler.go:547] Starting scale down
I0313 16:10:42.751715 1 scale_down.go:828] voyager-node-17f83f4b273
was unneeded for 10m3.096154674s
I0313 16:10:42.751731 1 cluster.go:139] voyager-node-17f83f4b273 for
removal
I0313 16:10:42.751768 1 cluster.go:322] Pod
default/php-apache-7656945b6b-nm8b6 can be moved to voyager-node-17f83bf431c
I0313 16:10:42.751781 1 cluster.go:167] node voyager-node-17f83f4b273
may be removed
I0313 16:10:42.751804 1 scale_down.go:980] Scale-down: removing node
voyager-node-17f83f4b273, utilization: {0.15 0.1082440988995888 0 cpu 0.15},
pods to reschedule: default/php-apache-7656945b6b-nm8b6
I0313 16:10:42.752137 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system",
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df",
APIVersion:"v1", ResourceVersion:"13552", FieldPath:""}): type: 'Normal'
reason: 'ScaleDown' Scale-down: removing node voyager-node-17f83f4b273,
utilization: {0.15 0.1082440988995888 0 cpu 0.15}, pods to reschedule:
default/php-apache-7656945b6b-nm8b6
I0313 16:10:42.763508 1 delete.go:103] Successfully added
ToBeDeletedTaint on node voyager-node-17f83f4b273
I0313 16:10:42.765309 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Node", Namespace:"",
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0",
APIVersion:"v1", ResourceVersion:"12899", FieldPath:""}): type: 'Normal'
reason: 'ScaleDown' marked the node as toBeDeleted/unschedulable
I0313 16:10:42.772269 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system",
Name:"weave-net-qjv74", UID:"526d282a-5c5e-41f9-af1c-5008730dfaae",
APIVersion:"v1", ResourceVersion:"9683", FieldPath:""}): type: 'Normal' reason:
'ScaleDown' deleting pod for node scale down
I0313 16:10:42.776486 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Pod", Namespace:"default",
Name:"php-apache-7656945b6b-nm8b6", UID:"b68a74d7-374a-4367-9784-737e3bdd3f80",
APIVersion:"v1", ResourceVersion:"9943", FieldPath:""}): type: 'Normal' reason:
'ScaleDown' deleting pod for node scale down
I0313 16:10:42.808539 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system",
Name:"kube-proxy-shdrg", UID:"33ad9fac-0cf5-495c-9147-062d874f3005",
APIVersion:"v1", ResourceVersion:"9660", FieldPath:""}): type: 'Normal' reason:
'ScaleDown' deleting pod for node scale down
E0313 16:10:42.830691 1 scale_down.go:1349] Not deleted yet
default/php-apache-7656945b6b-nm8b6
I0313 16:10:45.116519 1 reflector.go:536]
/home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356:
Watch close - *v1.StatefulSet total 7 items received
I0313 16:10:47.836224 1 scale_down.go:1360] All pods removed from
voyager-node-17f83f4b273
I0313 16:10:47.836668 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&nodeids=b23c7e87-5732-65e0-3d9c-0459c34c2115&response=json&signature=***
I0313 16:10:47.849279 1 client.go:175] NewAPIRequest response status
code:431
I0313 16:10:47.851936 1 delete.go:197] Releasing taint
{Key:ToBeDeletedByClusterAutoscaler Value:1647187842 Effect:NoSchedule
TimeAdded:<nil>} on node voyager-node-17f83f4b273
I0313 16:10:47.860208 1 delete.go:228] Successfully released
ToBeDeletedTaint on node voyager-node-17f83f4b273
E0313 16:10:47.860245 1 scale_down.go:1006] Failed to delete
voyager-node-17f83f4b273: failed to delete voyager-node-17f83f4b273: Unable to
delete [b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error
code 9999) Unable to execute API command scalekubernetescluster due to invalid
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due
to incorrect long value format, or entity does not exist or due to incorrect
parameter annotation for the field in api cmd class.
I0313 16:10:47.860284 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Node", Namespace:"",
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0",
APIVersion:"v1", ResourceVersion:"12899", FieldPath:""}): type: 'Warning'
reason: 'ScaleDownFailed' failed to delete the node
I0313 16:10:50.050998 1 reflector.go:255] Listing and watching
*v1.CSIDriver from
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134)
W0313 16:10:50.055262 1 reflector.go:324]
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
failed to list *v1.CSIDriver:
[csidrivers.storage.k8s.io](http://csidrivers.storage.k8s.io/) is forbidden:
User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list
resource "csidrivers" in API group "[storage.k8s.io](http://storage.k8s.io/)"
at the cluster scope
E0313 16:10:50.055307 1 reflector.go:138]
[k8s.io/client-go/informers/factory.go:134](http://k8s.io/client-go/informers/factory.go:134):
Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver:
[csidrivers.storage.k8s.io](http://csidrivers.storage.k8s.io/) is forbidden:
User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list
resource "csidrivers" in API group "[storage.k8s.io](http://storage.k8s.io/)"
at the cluster scope
I0313 16:10:52.766096 1 static_autoscaler.go:235] Starting main loop
I0313 16:10:52.766309 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=listKubernetesClusters&id=7c06e610-145c-4806-be2b-99f37a2b2b75&response=json&signature=***
I0313 16:10:52.802153 1 client.go:175] NewAPIRequest response status
code:200
I0313 16:10:52.802615 1 cloudstack_manager.go:88] Got cluster :
&{7c06e610-145c-4806-be2b-99f37a2b2b75 voyager 1 3 2 1 [0xc0010993b0
0xc0010993e0 0xc001099410]}
W0313 16:10:52.803058 1 clusterstate.go:590] Failed to get nodegroup
for 4cbcb62e-99ca-4d00-af22-e1eaae2ae691: Unable to find node
4cbcb62e-99ca-4d00-af22-e1eaae2ae691 in cluster
W0313 16:10:52.803075 1 clusterstate.go:590] Failed to get nodegroup
for 52b47c11-175a-441c-a5a4-96e83971f32a: Unable to find node
52b47c11-175a-441c-a5a4-96e83971f32a in cluster
W0313 16:10:52.803080 1 clusterstate.go:590] Failed to get nodegroup
for cec92edd-2c62-450b-8af5-ced062c7a84d: Unable to find node
cec92edd-2c62-450b-8af5-ced062c7a84d in cluster
I0313 16:10:52.803099 1 static_autoscaler.go:341] 3 unregistered nodes
present
I0313 16:10:52.803108 1 static_autoscaler.go:624] Removing
unregistered node cec92edd-2c62-450b-8af5-ced062c7a84d
W0313 16:10:52.803114 1 static_autoscaler.go:627] Failed to get node
group for cec92edd-2c62-450b-8af5-ced062c7a84d: Unable to find node
cec92edd-2c62-450b-8af5-ced062c7a84d in cluster
W0313 16:10:52.803120 1 static_autoscaler.go:346] Failed to remove
unregistered nodes: Unable to find node cec92edd-2c62-450b-8af5-ced062c7a84d in
cluster
```
These are logs after 20 mins
```
I0313 16:31:09.238844 1 static_autoscaler.go:547] Starting scale down
I0313 16:31:09.238889 1 scale_down.go:828] voyager-node-17f83f4b273
was unneeded for 10m3.256641915s
I0313 16:31:09.238905 1 scale_down.go:1102] Scale-down: removing empty
node voyager-node-17f83f4b273
I0313 16:31:09.239138 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system",
Name:"cluster-autoscaler-status", UID:"d56cd0a7-29fb-4347-b477-f12daac848df",
APIVersion:"v1", ResourceVersion:"17347", FieldPath:""}): type: 'Normal'
reason: 'ScaleDownEmpty' Scale-down: removing empty node
voyager-node-17f83f4b273
I0313 16:31:09.255891 1 delete.go:103] Successfully added
ToBeDeletedTaint on node voyager-node-17f83f4b273
I0313 16:31:09.256402 1 client.go:169] NewAPIRequest API request
URL:http://103.13.114.141:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=7c06e610-145c-4806-be2b-99f37a2b2b75&nodeids=b23c7e87-5732-65e0-3d9c-0459c34c2115&response=json&signature=***
I0313 16:31:09.267422 1 client.go:175] NewAPIRequest response status
code:431
E0313 16:31:09.269341 1 scale_down.go:1144] Problem with empty node
deletion: failed to delete voyager-node-17f83f4b273: Unable to delete
[b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error code
9999) Unable to execute API command scalekubernetescluster due to invalid
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due
to incorrect long value format, or entity does not exist or due to incorrect
parameter annotation for the field in api cmd class.
I0313 16:31:09.272042 1 delete.go:197] Releasing taint
{Key:ToBeDeletedByClusterAutoscaler Value:1647189069 Effect:NoSchedule
TimeAdded:<nil>} on node voyager-node-17f83f4b273
I0313 16:31:09.279999 1 delete.go:228] Successfully released
ToBeDeletedTaint on node voyager-node-17f83f4b273
I0313 16:31:09.306255 1 event_sink_logging_wrapper.go:48]
Event(v1.ObjectReference{Kind:"Node", Namespace:"",
Name:"voyager-node-17f83f4b273", UID:"56023857-197a-4844-ae26-443c541066b0",
APIVersion:"v1", ResourceVersion:"16596", FieldPath:""}): type: 'Warning'
reason: 'ScaleDownFailed' failed to delete empty node: failed to delete
voyager-node-17f83f4b273: Unable to delete
[b23c7e87-5732-65e0-3d9c-0459c34c2115] from cluster : (HTTP 431, error code
9999) Unable to execute API command scalekubernetescluster due to invalid
value. Invalid parameter nodeids value=b23c7e87-5732-65e0-3d9c-0459c34c2115 due
to incorrect long value format, or entity does not exist or due to incorrect
parameter annotation for the field in api cmd class.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]