[ 
https://issues.apache.org/jira/browse/YUNIKORN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842106#comment-17842106
 ] 

Peter Bacsko edited comment on YUNIKORN-2526 at 4/29/24 5:27 PM:
-----------------------------------------------------------------

[~shravan-achar] I looked at the logs and I have two remarks:

1) It's not the entire logfile. I don't see the recovery part at all.
2) I do see "running predicates failed", HOWEVER this does seem to be perfectly 
normal:
{noformat}
2024-04-18T03:44:32.910Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.910Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.911Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}

2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}

2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.952Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.954Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.954Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.961Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.963Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.964Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.968Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.969Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.969Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
{noformat}
Basically, Yunikorn is trying to find a suitable node for each pod, but the 
following 3 nodes are never proper candidates due to selectors:
{noformat}
ip-240-56-121-252.us-west-2.compute.internal
ip-240-56-185-12.us-west-2.compute.internal
ip-240-56-149-219.us-west-2.compute.internal
{noformat}
Eventually, all requests got allocated:
{noformat}
2024-04-18T03:44:32.914Z        INFO    core.scheduler.queue    
objects/queue.go:1391   allocation found on queue       {"queueName": 
"root.yunikorn-scale", "appID": "21247d4b-3fda-49e6-a3f9-aa020f194210", 
"allocation": "applicationID=21247d4b-3fda-49e6-a3f9-aa020f194210, 
allocationID=4d3e74e6-feb4-4c6c-ad7f-a8b920edb878-0, 
allocationKey=4d3e74e6-feb4-4c6c-ad7f-a8b920edb878, Node=kwok-node-59zq5, 
result=Allocated"}
...
2024-04-18T03:44:32.923Z        INFO    core.scheduler.queue    
objects/queue.go:1391   allocation found on queue       {"queueName": 
"root.yunikorn-scale", "appID": "21247d4b-3fda-49e6-a3f9-aa020f194210", 
"allocation": "applicationID=21247d4b-3fda-49e6-a3f9-aa020f194210, 
allocationID=0e842ada-f4fa-48e4-9f43-db6c57506cb6-0, 
allocationKey=0e842ada-f4fa-48e4-9f43-db6c57506cb6, Node=kwok-node-59zq5, 
result=Allocated"}
{noformat}
Are you sure this is the proper log file?


was (Author: pbacsko):
[~shravan-achar] I looked at the logs and I have two remarks:

1) It's not the entire logfile. I don't see the recovery part at all.
2) I do see "running predicates failed", HOWEVER this does seem to be perfectly 
normal:
{noformat}
2024-04-18T03:44:32.910Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.910Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.911Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"4d3e74e6-feb4-4c6c-ad7f-a8b920edb878", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.921Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"0e842ada-f4fa-48e4-9f43-db6c57506cb6", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}

2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.926Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b910f8aa-76bb-4d6c-8b3e-8b5b3f0d594d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.930Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"7d738667-5248-4772-9198-589896b77c6d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.934Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"b2f42e11-529d-4cb2-b42e-47af44605e6b", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}

2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.938Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"9d91401e-6912-41e6-b78a-f384e67d72d1", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.944Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"d5d5e82d-79c6-4648-9d64-eb348557ec03", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.948Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"83980870-df92-4da4-90b7-e71eab19df00", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.952Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.954Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.954Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"63fb79e6-b325-4f62-aa89-c88728273df0", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.961Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.963Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.964Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"85105e63-2f3e-4741-b741-c84b95bb9419", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
...
2024-04-18T03:44:32.968Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-121-252.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.969Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-185-12.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
2024-04-18T03:44:32.969Z        DEBUG   core.scheduler.node     
objects/node.go:403     running predicates failed       {"allocationKey": 
"31c9da47-696a-4329-b915-5e16a3793e1d", "nodeID": 
"ip-240-56-149-219.us-west-2.compute.internal", "allocateFlag": true, "error": 
"node(s) didn't match Pod's node affinity/selector"}
{noformat}

Basically, Yunikorn is trying to find a suitable node for each pod, but the 
following 3 nodes are never proper candidates due to selectors:
{noformat}
ip-240-56-121-252.us-west-2.compute.internal
ip-240-56-185-12.us-west-2.compute.internal
ip-240-56-149-219.us-west-2.compute.internal
{noformat}

Eventually, all requests got allocated:
{noformat}
2024-04-18T03:44:32.914Z        INFO    core.scheduler.queue    
objects/queue.go:1391   allocation found on queue       {"queueName": 
"root.yunikorn-scale", "appID": "21247d4b-3fda-49e6-a3f9-aa020f194210", 
"allocation": "applicationID=21247d4b-3fda-49e6-a3f9-aa020f194210, 
allocationID=4d3e74e6-feb4-4c6c-ad7f-a8b920edb878-0, 
allocationKey=4d3e74e6-feb4-4c6c-ad7f-a8b920edb878, Node=kwok-node-59zq5, 
result=Allocated"}
...
2024-04-18T03:44:32.923Z        INFO    core.scheduler.queue    
objects/queue.go:1391   allocation found on queue       {"queueName": 
"root.yunikorn-scale", "appID": "21247d4b-3fda-49e6-a3f9-aa020f194210", 
"allocation": "applicationID=21247d4b-3fda-49e6-a3f9-aa020f194210, 
allocationID=0e842ada-f4fa-48e4-9f43-db6c57506cb6-0, 
allocationKey=0e842ada-f4fa-48e4-9f43-db6c57506cb6, Node=kwok-node-59zq5, 
result=Allocated"}
{noformat}

Are you sure these are the proper log files?

> Discrepancy between shim cache and core app/task list after scheduler restart
> -----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2526
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2526
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Shravan Achar
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: log-snippet.txt, 
> logs-2be04314-bed0-4385-9ae7-50ed0ef9d9d5.txt.zip, 
> logs-49f01ed0-3473-4521-b11f-80e27adb7250.txt.zip, 
> logs-complete-post.txt.zip, state-dump-4-1-3.json, state-dump-4-17.json.zip
>
>
> When scheduler restarts, occasionally it gets into a situation where the 
> application is still in Running state despite the application getting 
> terminated in the cluster. This is confirmed with the attached state dump.
>  
> The scheduler core logs indicate all nodes are being evaluated for 
> non-existing application (also attached). The CPU is being used up doing this 
> unneeded evaluation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to