[ 
https://issues.apache.org/jira/browse/YUNIKORN-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865844#comment-17865844
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2687:
-------------------------------------------------

I agree with [~blue.tzuhua] this is normal gang scheduling. Using YuniKorn 1.3 
and K8s 1.21, you are working with really old releases. The K8s version is out 
of support and since 1.3 we have fixed numerous jiras.

Analysis: Looks like K8s never came back with a response that the placeholder 
for the driver was released when we tried to:
{code:java}
2024-06-20T17:48:27.093Z        INFO        objects/application.go:668        
ask added successfully to application        {"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": 
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta": 
"map[memory:1975517184 pods:1 vcore:1000]"}
...
2024-06-20T17:48:27.093Z        INFO        scheduler/partition.go:828        
scheduler replace placeholder processed        {"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": 
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid": 
"a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid": 
"bdb020e8-708c-4eb7-b48c-fba16155941c"}
...
2024-06-20T17:48:27.094Z        INFO        cache/application.go:637        try 
to release pod from application        {"appID": 
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": 
"bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType": 
"PLACEHOLDER_REPLACED"}

{code}
That means we just wait for that to happen. We cannot do more than that. Looks 
like you had an issue on the K8s side... 

> Placeholder Timeout and Replacement Failure in Gang Scheduling
> --------------------------------------------------------------
>
>                 Key: YUNIKORN-2687
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2687
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: huangzhir
>            Assignee: Tzu-Hua Lan
>            Priority: Blocker
>
> h1. *Description:*
> When using gang scheduling with YuniKorn, the driver pod encounters a 
> placeholder timeout, leading to a failure in replacement. The pod shows a 
> pending status for approximately 60 seconds.
> h2. *Observed Behavior:*
>  * The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and 
> waiting for allocation.
>  * The pod belongs to the {{spark-driver}} task group and is scheduled as a 
> gang member.
>  * A warning indicating "Placeholder timed out" is logged, and the 
> placeholder is not replaced successfully.
>  * The pod is eventually assigned and bound to a node, and the task completes.
>  * There is a 60-second pending period observed for the driver pod.
> h2. *Pod Status:*
> {code:java}
> kubectl get pod -n spark
> NAME                                               READY   STATUS        
> RESTARTS   AGE
> spark-pi-6d2eea9036f9c838-driver                   0/1     Pending       0    
>       61s
> tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0     1/1     Terminating   0    
>       60s
> tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0   1/1     Running       0    
>       60s
> kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark
> ......
>   Type     Reason                 Age    From      Message
>   ----     ------                 ----   ----      -------
>   Normal   Scheduling             2m52s  yunikorn  
> spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation
>   Normal   GangScheduling         2m52s  yunikorn  Pod belongs to the 
> taskGroup spark-driver, it will be scheduled as a gang member
>   Warning  Placeholder timed out  113s   yunikorn  Application 
> spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out
>   Normal   Scheduled              100s   yunikorn  Successfully assigned 
> spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66
>   Normal   PodBindSuccessful      100s   yunikorn  Pod 
> spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node 
> 10.10.10.66
>   Normal   TaskCompleted          50s    yunikorn  Task 
> spark/spark-pi-d86d1d9036b8e8e9-driver is completed
>   Normal   Pulled                 99s    kubelet   Container image 
> "apache/spark:v3.3.2" already present on machine
>   Normal   Created                99s    kubelet   Created container 
> spark-kubernetes-driver
>   Normal   Started                99s    kubelet   Started container 
> spark-kubernetes-driver{code}
> h2. *Scheduler Logs:*
> {code:java}
> 2024-06-20T17:49:26.093Z        INFO        objects/application.go:440        
> Placeholder timeout, releasing placeholders        {"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1, 
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMReleaseAllocationEvent", 
> "event": 
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
>  allocated placeholders on placeholder 
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]}, 
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z        DEBUG        
> callback/scheduler_callback.go:48        UpdateAllocation callback received   
>      {"UpdateAllocationResponse": "released:<partitionName:\"default\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT 
> message:\"releasing allocated placeholders on placeholder timeout\" 
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z        DEBUG        
> callback/scheduler_callback.go:81        callback: response to released 
> allocations        {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z        DEBUG        cache/context.go:620        
> forget pod        {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:501  
>       Removing assumed pod from cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z        DEBUG        objects/application.go:1947      
>   Successfully synced shim on released allocations. response: no. of 
> allocations: 1
> 2024-06-20T17:49:26.093Z        DEBUG        objects/application.go:409       
>  Application placeholder timer cleared        {"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z        INFO        cache/application.go:637        
> try to release pod from application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z        DEBUG        cache/application.go:692        
> trying to send placeholder timeout events to the original pod from 
> application        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app 
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver  
> spark  d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23 
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi 
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver 
> spark-version:3.3.2] 
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb 
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60 
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver 
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\": 
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\": 
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\", 
> \"memory\": \"2Gi\"} }]] [] []  [{okhttp Update v1 2024-06-20 17:48:23 +0000 
> UTC FieldsV1 
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>  
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
>  --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.SparkPi 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar 
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
>  {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi 
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: 
> {{1975517184 0} {<nil>} 1884Mi 
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
>  "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType": 
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests ... {code}
> h2. *Reproduction Steps:*
> Submit the Spark job with the following command:
> {code:java}
> ./bin/spark-submit \
>   --master k8s://https://10.10.10.10:6443 \
>   --deploy-mode cluster \
>   --name spark-pi \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \
>   --conf spark.kubernetes.namespace=spark \
>   --class org.apache.spark.examples.SparkPi \
>   --conf spark.executor.instances=1 \
>   --conf spark.executor.cores=1 \
>   --conf spark.executor.memory=1500m \
>   --conf spark.driver.cores=1 \
>   --conf spark.driver.memory=1500m \
>   --conf spark.kubernetes.driver.limit.cores=1 \
>   --conf spark.kubernetes.driver.limit.memory=2G \
>   --conf spark.kubernetes.executor.limit.cores=1 \
>   --conf spark.kubernetes.executor.limit.memory=2G \
>    --conf spark.kubernetes.driver.label.app=spark \
>   --conf spark.kubernetes.executor.label.app=spark \
>   --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
>   --conf spark.kubernetes.scheduler.name=yunikorn \
>   --conf spark.kubernetes.driver.label.queue=root.spark \
>   --conf spark.kubernetes.executor.label.queue=root.spark \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
>  \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
> "2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor", 
> "minMember": 1, "minResource": {"cpu": "1", "memory": "2Gi"},"nodeSelector": 
> {"app": "spark"} }]' \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=60
>  gangSchedulingStyle=Hard' \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
>  \
>   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar \
>   3000 {code}
> h2. Environment:
>  * YuniKorn version: 1.3.0
>  * Kubernetes version: 1.21.3
>  * Spark version: 3.2.2
> h2. *resource-quota.yaml*
>  *None*
> h2. yunikorn-configs.yaml 
> {code:java}
> apiVersion: v1
> kind: ConfigMap
> metadata:
>   name: yunikorn-configs
>   namespace: yunikorn
> data:
>   log.level: "-1"
>   log.admission.level: "-1"
>   log.core.config.level: "-1"
>   queues.yaml: |
>     partitions:
>       - name: default
>         placementrules:
>           - name: tag
>             value: namespace
>             create: true
>         queues:
>           - name: root
>             submitacl: '*'
>             properties:
>               application.sort.policy: fifo
>               placeholderTimeoutInSeconds: 60
>               schedulingStyle: Hard
>             queues:
>               - name: spark
>                 properties:
>                   application.sort.policy: fifo
>                   placeholderTimeoutInSeconds: 60
>                   schedulingStyle: Hard
>                 resources:
>                   guaranteed:
>                     vcore: 10
>                     memory: 10Gi
>                   max:
>                     vcore: 10
>                     memory: 10Gi {code}
> h2. scheduler logs
> {code:java}
> 2024-06-20T17:48:23.144Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:23.744Z        DEBUG        cache/context.go:233        
> adding pod to cache        {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:23.744Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Pre)        {"nodes": 3, "pods": 55, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":2,"Running":53}}
> 2024-06-20T17:48:23.744Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "spark-pi-0671699036c3885c-driver", "podKey": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z        DEBUG        general/general.go:141        
> pod added        {"appType": "general", "Name": 
> "spark-pi-0671699036c3885c-driver", "Namespace": "spark"}
> 2024-06-20T17:48:23.744Z        DEBUG        utils/utils.go:305        Unable 
> to retrieve user name from pod labels. Empty user label        {"userLabel": 
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:23.744Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Post)        {"nodes": 3, "pods": 56, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:23.744Z        DEBUG        cache/context.go:737        
> AddApplication        {"Request": 
> {"Metadata":{"ApplicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","QueueName":"root.spark","User":"nobody","Tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
>  gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\": 
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\", 
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1, 
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} 
> }]"},"Groups":null,"TaskGroups":[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}},{"name":"spark-executor","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}}],"OwnerReferences":[{"apiVersion":"v1","kind":"Pod","name":"spark-pi-0671699036c3885c-driver","uid":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","controller":false,"blockOwnerDeletion":true}],"SchedulingPolicyParameters":{},"CreationTime":0}}}
> 2024-06-20T17:48:23.744Z        DEBUG        cache/context.go:746        app 
> namespace info        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "namespace": "spark"}
> 2024-06-20T17:48:23.744Z        INFO        cache/context.go:773        app 
> added        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:23.744Z        DEBUG        cache/context.go:841        
> AddTask        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z        INFO        cache/context.go:863        task 
> added        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskState": "New"}
> 2024-06-20T17:48:23.744Z        INFO        cache/context.go:873        app 
> request originating pod added        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "original task": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:24.090Z        DEBUG        cache/application_state.go:508   
>      shim app state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination": 
> "Submitted", "event": "SubmitApplication"}
> 2024-06-20T17:48:24.090Z        INFO        cache/application.go:424        
> handle app submission        {"app": "applicationID: 
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, queue: root.spark, partition: 
> default, totalNumOfTasks: 1, currentState: Submitted", "clusterID": 
> "mycluster"}
> 2024-06-20T17:48:24.090Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateApplicationEvent", 
> "event": 
> {"Request":{"new":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","queueName":"root.spark","partitionName":"[mycluster]default","ugi":{"user":"nobody"},"tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
>  gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\": 
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\", 
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1, 
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} 
> }]"},"executionTimeoutMilliSeconds":60000,"placeholderAsk":{"resources":{"memory":{"value":4294967296},"pods":{"value":2},"vcore":{"value":2000}}},"gangSchedulingStyle":"Hard"}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z        DEBUG        placement/placement.go:145       
>  Executing rule for placing application        {"ruleName": "tag", 
> "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z        DEBUG        placement/tag_rule.go:106        
> Tag rule intermediate result        {"application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z        INFO        placement/tag_rule.go:115        
> Tag rule application placed        {"application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z        DEBUG        placement/placement.go:204       
>  Rule result for placing application        {"application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queueName": "root.spark"}
> 2024-06-20T17:48:24.090Z        INFO        scheduler/context.go:549        
> Added application to partition        {"applicationID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "partitionName": 
> "[mycluster]default", "requested queue": "root.spark", "placed queue": 
> "root.spark"}
> 2024-06-20T17:48:24.090Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMApplicationUpdateEvent", 
> "event": 
> {"RmID":"mycluster","AcceptedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}],"RejectedApplications":[],"UpdatedApplications":null},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z        DEBUG        
> callback/scheduler_callback.go:108        UpdateApplication callback received 
>        {"UpdateApplicationResponse": 
> "accepted:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > "}
> 2024-06-20T17:48:24.090Z        DEBUG        
> callback/scheduler_callback.go:114        callback: response to accepted 
> application        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z        INFO        
> callback/scheduler_callback.go:118        Accepting app        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z        DEBUG        cache/application_state.go:508   
>      shim app state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Submitted", 
> "destination": "Accepted", "event": "AcceptApplication"}
> 2024-06-20T17:48:24.145Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:25.091Z        DEBUG        cache/application.go:516        
> postAppAccepted on cached app        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "numTaskGroups": 2, 
> "numAllocatedTasks": 0}
> 2024-06-20T17:48:25.091Z        INFO        cache/application.go:526        
> app has taskGroups defined, trying to reserve resources for gang members      
>   {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:25.091Z        DEBUG        cache/application_state.go:508   
>      shim app state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted", 
> "destination": "Reserving", "event": "TryReserve"}
> 2024-06-20T17:48:25.102Z        INFO        cache/placeholder_manager.go:103  
>       placeholder created        {"placeholder": "appID: 
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-driver, podName: 
> spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z        DEBUG        general/general.go:141        
> pod added        {"appType": "general", "Name": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.102Z        DEBUG        utils/utils.go:305        Unable 
> to retrieve user name from pod labels. Empty user label        {"userLabel": 
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.102Z        DEBUG        cache/context.go:841        
> AddTask        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z        INFO        cache/context.go:863        task 
> added        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskState": "New"}
> 2024-06-20T17:48:25.102Z        DEBUG        cache/context.go:233        
> adding pod to cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Pre)        {"nodes": 3, "pods": 56, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:25.102Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Post)        {"nodes": 3, "pods": 57, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z        INFO        cache/placeholder_manager.go:103  
>       placeholder created        {"placeholder": "appID: 
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-executor, podName: 
> spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z        DEBUG        general/general.go:141        
> pod added        {"appType": "general", "Name": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.105Z        DEBUG        utils/utils.go:305        Unable 
> to retrieve user name from pod labels. Empty user label        {"userLabel": 
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.105Z        DEBUG        cache/context.go:841        
> AddTask        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z        INFO        cache/context.go:863        task 
> added        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskState": "New"}
> 2024-06-20T17:48:25.105Z        DEBUG        cache/context.go:233        
> adding pod to cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Pre)        {"nodes": 3, "pods": 57, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AddPod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:25.146Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:26.092Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias": 
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New", 
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New", 
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias": 
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Pending", 
> "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z        DEBUG        cache/task.go:275        
> scheduling pod        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z        DEBUG        cache/task.go:294        send 
> update request        {"request": 
> "asks:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > > 
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\" 
> value:<value:1000 > > > maxAllocations:1 
> tags:<key:\"kubernetes.io/label/applicationId\" 
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > 
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" > 
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" > 
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" > 
> tags:<key:\"kubernetes.io/meta/podName\" 
> value:\"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0\" > 
> taskGroupName:\"spark-driver\" placeholder:true 
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > > 
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": 
> "Pending", "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z        DEBUG        cache/task.go:275        
> scheduling pod        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z        DEBUG        cache/task.go:294        send 
> update request        {"request": 
> "asks:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > > 
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\" 
> value:<value:1000 > > > maxAllocations:1 
> tags:<key:\"kubernetes.io/label/applicationId\" 
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > 
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" > 
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" > 
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" > 
> tags:<key:\"kubernetes.io/meta/podName\" 
> value:\"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0\" > 
> taskGroupName:\"spark-executor\" placeholder:true 
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > > 
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateAllocationEvent", 
> "event": 
> {"Request":{"asks":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-driver","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:26.092Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateAllocationEvent", 
> "event": 
> {"Request":{"asks":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-executor","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
>  "currentQueueSize": 1}
> 2024-06-20T17:48:26.092Z        INFO        objects/application_state.go:133  
>       Application state transition        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination": 
> "Accepted", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMApplicationUpdateEvent", 
> "event": 
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Accepted","stateTransitionTimestamp":1718905706092997053,"message":"runApplication"}]},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z        INFO        objects/application.go:668        
> ask added successfully to application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "placeholder": true, "pendingDelta": 
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        INFO        objects/application.go:668        
> ask added successfully to application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "placeholder": true, "pendingDelta": 
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:108        UpdateApplication callback received 
>        {"UpdateApplicationResponse": 
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> state:\"Accepted\" stateTransitionTimestamp:1718905706092997053 
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:137        status update callback received     
>    {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status": 
> "Accepted"}
> 2024-06-20T17:48:26.093Z        DEBUG        objects/application.go:397       
>  Application placeholder timer initiated        {"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/manager.go:63        
> Increasing resource usage        {"user": "nobody", "queue path": 
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/manager.go:265        Group 
> tracker already exists and linking (reusing) the same with application        
> {"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue path": 
> "root.spark", "user": "nobody", "group": "nobody"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        objects/queue.go:1239        
> allocation found on queue        {"queueName": "root.spark", "appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation": 
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb, 
> uuid=bdb020e8-708c-4eb7-b48c-fba16155941c, 
> allocationKey=470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7, Node=10.10.10.66, 
> result=Allocated"}
> 2024-06-20T17:48:26.093Z        INFO        scheduler/partition.go:888        
> scheduler allocation processed        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "uuid": 
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "allocatedResource": 
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true, 
> "targetNode": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMNewAllocationsEvent", "event": 
> {"RmID":"mycluster","Allocations":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.66","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-driver","placeholder":true}]},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:48        UpdateAllocation callback received   
>      {"UpdateAllocationResponse": 
> "new:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" 
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\" 
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > > 
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\" 
> value:<value:1000 > > > nodeID:\"10.10.10.66\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> taskGroupName:\"spark-driver\" placeholder:true > "}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:53        callback: response to new allocation 
>        {"allocationKey": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "UUID": 
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "applicationID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AssumePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z        DEBUG        external/scheduler_cache.go:476  
>       Adding assumed pod to cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "node": "10.10.10.66", "allBound": 
> true}
> 2024-06-20T17:48:26.093Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AssumePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z        DEBUG        scheduler/context.go:853        
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.093Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias": 
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": 
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.093Z        DEBUG        cache/task.go:349        bind 
> pod volumes        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z        INFO        cache/context.go:499        
> Binding Pod Volumes skipped: all volumes already bound        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.093Z        DEBUG        cache/task.go:362        bind 
> pod        {"podName": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", 
> "podUID": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z        INFO        client/kubeclient.go:112        
> bind pod to node        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/manager.go:63        
> Increasing resource usage        {"user": "nobody", "queue path": 
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        ugm/queue_tracker.go:57        
> Increasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z        DEBUG        objects/application.go:339       
>  Application state timer initiated        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "state": "Starting", "timeout": 
> "5m0s"}
> 2024-06-20T17:48:26.093Z        INFO        objects/application_state.go:133  
>       Application state transition        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted", 
> "destination": "Starting", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMApplicationUpdateEvent", 
> "event": 
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Starting","stateTransitionTimestamp":1718905706093901659,"message":"runApplication"}]},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z        DEBUG        objects/queue.go:1239        
> allocation found on queue        {"queueName": "root.spark", "appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation": 
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb, 
> uuid=c834ce9b-c084-4c80-aa38-1f9e4bfe8020, 
> allocationKey=4c776fc0-e9c0-4279-806f-8386508d5202, Node=10.10.10.22, 
> result=Allocated"}
> 2024-06-20T17:48:26.093Z        INFO        scheduler/partition.go:888        
> scheduler allocation processed        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "uuid": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "allocatedResource": 
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true, 
> "targetNode": "10.10.10.22"}
> 2024-06-20T17:48:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMNewAllocationsEvent", "event": 
> {"RmID":"mycluster","Allocations":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.22","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-executor","placeholder":true}]},
>  "currentQueueSize": 1}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:108        UpdateApplication callback received 
>        {"UpdateApplicationResponse": 
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> state:\"Starting\" stateTransitionTimestamp:1718905706093901659 
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:137        status update callback received     
>    {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status": 
> "Starting"}
> 2024-06-20T17:48:26.093Z        DEBUG        
> callback/scheduler_callback.go:48        UpdateAllocation callback received   
>      {"UpdateAllocationResponse": 
> "new:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" 
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" 
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > > 
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\" 
> value:<value:1000 > > > nodeID:\"10.10.10.22\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> taskGroupName:\"spark-executor\" placeholder:true > "}
> 2024-06-20T17:48:26.094Z        DEBUG        
> callback/scheduler_callback.go:53        callback: response to new allocation 
>        {"allocationKey": "4c776fc0-e9c0-4279-806f-8386508d5202", "UUID": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "applicationID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.094Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AssumePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z        DEBUG        external/scheduler_cache.go:476  
>       Adding assumed pod to cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "node": "10.10.10.22", "allBound": 
> true}
> 2024-06-20T17:48:26.094Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (AssumePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z        DEBUG        scheduler/context.go:853        
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.094Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": 
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.094Z        DEBUG        cache/task.go:349        bind 
> pod volumes        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z        INFO        cache/context.go:499        
> Binding Pod Volumes skipped: all volumes already bound        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.094Z        DEBUG        cache/task.go:362        bind 
> pod        {"podName": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", 
> "podUID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z        INFO        client/kubeclient.go:112        
> bind pod to node        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.097Z        INFO        cache/task.go:375        
> successfully bound pod        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z        INFO        cache/task.go:427        
> placeholder is bound        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName": 
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName": 
> "spark-driver"}
> 2024-06-20T17:48:26.098Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias": 
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": 
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z        INFO        cache/task.go:375        
> successfully bound pod        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z        INFO        cache/task.go:427        
> placeholder is bound        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName": 
> "spark-executor"}
> 2024-06-20T17:48:26.098Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": 
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z        DEBUG        cache/application_state.go:508   
>      shim app state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Reserving", 
> "destination": "Running", "event": "RunApplication"}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.098Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.107Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.147Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:27.002Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.002Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.002Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.014Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias": 
> "spark/spark-pi-0671699036c3885c-driver", "source": "New", "destination": 
> "Pending", "event": "InitTask"}
> 2024-06-20T17:48:27.093Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias": 
> "spark/spark-pi-0671699036c3885c-driver", "source": "Pending", "destination": 
> "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:27.093Z        DEBUG        cache/task.go:275        
> scheduling pod        {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:27.093Z        DEBUG        cache/task.go:294        send 
> update request        {"request": 
> "asks:<allocationKey:\"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> resourceAsk:<resources:<key:\"memory\" value:<value:1975517184 > > 
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\" 
> value:<value:1000 > > > maxAllocations:1 
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" > 
> tags:<key:\"kubernetes.io/label/spark-app-name\" value:\"spark-pi\" > 
> tags:<key:\"kubernetes.io/label/spark-app-selector\" 
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > 
> tags:<key:\"kubernetes.io/label/spark-role\" value:\"driver\" > 
> tags:<key:\"kubernetes.io/label/spark-version\" value:\"3.3.2\" > 
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" > 
> tags:<key:\"kubernetes.io/meta/podName\" 
> value:\"spark-pi-0671699036c3885c-driver\" > taskGroupName:\"spark-driver\" 
> Originator:true preemptionPolicy:<allowPreemptSelf:true 
> allowPreemptOther:true > > rmID:\"mycluster\" "}
> 2024-06-20T17:48:27.093Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateAllocationEvent", 
> "event": 
> {"Request":{"asks":[{"allocationKey":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":1975517184},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/queue":"root.spark","kubernetes.io/label/spark-app-name":"spark-pi","kubernetes.io/label/spark-app-selector":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/spark-role":"driver","kubernetes.io/label/spark-version":"3.3.2","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"spark-pi-0671699036c3885c-driver"},"taskGroupName":"spark-driver","Originator":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z        INFO        objects/application.go:668        
> ask added successfully to application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta": 
> "map[memory:1975517184 pods:1 vcore:1000]"}
> 2024-06-20T17:48:27.093Z        DEBUG        objects/queue.go:1275        
> allocation found on queue        {"queueName": "root.spark", "appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation": 
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb, 
> uuid=a53e9cbb-931d-4d1d-95e5-8e2425ba95be, 
> allocationKey=d42081ec-a8c4-4fcb-8e40-e4739a67fbfe, Node=10.10.10.66, 
> result=Replaced"}
> 2024-06-20T17:48:27.093Z        INFO        scheduler/partition.go:828        
> scheduler replace placeholder processed        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey": 
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid": 
> "a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid": 
> "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMReleaseAllocationEvent", 
> "event": 
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","terminationType":4,"message":"replacing
>  uuid: 
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be","allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}]},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z        DEBUG        
> callback/scheduler_callback.go:48        UpdateAllocation callback received   
>      {"UpdateAllocationResponse": "released:<partitionName:\"default\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\" 
> terminationType:PLACEHOLDER_REPLACED message:\"replacing uuid: 
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be\" 
> allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" > "}
> 2024-06-20T17:48:27.093Z        DEBUG        
> callback/scheduler_callback.go:81        callback: response to released 
> allocations        {"UUID": "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z        DEBUG        cache/context.go:620        
> forget pod        {"pod": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:27.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z        DEBUG        external/scheduler_cache.go:501  
>       Removing assumed pod from cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z        DEBUG        scheduler/context.go:884        
> Successfully synced shim on released allocations. response: no. of 
> allocations: 1
> 2024-06-20T17:48:27.094Z        INFO        cache/application.go:637        
> try to release pod from application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": 
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType": 
> "PLACEHOLDER_REPLACED"}
> 2024-06-20T17:48:27.099Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.099Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.099Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.147Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:27.290Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z        DEBUG        external/scheduler_cache.go:179  
>       Updating node in cache        {"nodeName": "10.10.10.22"}
> 2024-06-20T17:48:27.290Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z        DEBUG        cache/node.go:109        set 
> node capacity        {"nodeID": "10.10.10.22", "capacity": 
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > > 
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\" 
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5963 > > 
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34165359566 > > 
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > > 
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > > 
> resources:<key:\"memory\" value:<value:64713949184 > > 
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\" 
> value:<value:14000 > > "}
> 2024-06-20T17:48:27.290Z        INFO        cache/nodes.go:179        Node's 
> ready status flag        {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:48:27.290Z        INFO        cache/nodes.go:184        report 
> updated nodes to scheduler        {"request": 
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:48:27.290Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateNodeEvent", "event": 
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:48:27.290Z        INFO        objects/queue.go:1190        
> updating root queue max resources        {"current max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:20075 kubernetes.io/batch-memory:109430536751 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]", "new max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:48:27.290Z        DEBUG        objects/node.go:182        skip 
> updating occupiedResource, not changed
> 2024-06-20T17:48:27.421Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.421Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.421Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.906Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:48:28.148Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:28.234Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "9.409µs"}
> 2024-06-20T17:48:29.149Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:30.149Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:31.150Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:32.151Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:33.151Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:34.152Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:35.153Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:36.154Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:37.131Z        DEBUG        scheduler/health_checker.go:179  
>       Scheduler is healthy        {"health check values": 
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for 
> scheduling error entries in metrics","DiagnosisMessage":"There were 0 
> scheduling errors logged in the metrics"},{"Name":"Failed 
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in 
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the 
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check 
> for negative resources in the partitions","DiagnosisMessage":"Partitions with 
> negative resources: []"},{"Name":"Negative 
> resources","Succeeded":true,"Description":"Check for negative resources in 
> the nodes","DiagnosisMessage":"Nodes with negative resources: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a 
> node's allocated resource <= total resource of the 
> node","DiagnosisMessage":"Nodes with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> total partition resource == sum of the node resources from the 
> partition","DiagnosisMessage":"Partitions with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> node total resource = allocated resource + occupied resource + available 
> resource","DiagnosisMessage":"Nodes with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes 
> with inconsistent data: []"},{"Name":"Reservation 
> check","Succeeded":true,"Description":"Check the reservation nr compared to 
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio: 
> [0.000000]"},{"Name":"Orphan allocation on node 
> check","Succeeded":true,"Description":"Check if there are orphan allocations 
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan 
> allocation on app check","Succeeded":true,"Description":"Check if there are 
> orphan allocations on the 
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:48:37.154Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:38.155Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:38.235Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "6.423µs"}
> 2024-06-20T17:48:39.155Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:40.156Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:41.156Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:42.157Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:43.158Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:44.158Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:45.159Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:46.160Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:47.161Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:48.162Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:48.236Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "7.983µs"}
> 2024-06-20T17:48:49.162Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:50.163Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:51.164Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:52.165Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:53.166Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:54.168Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:55.168Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:56.169Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:57.169Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:58.169Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:48:58.237Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "8.146µs"}
> 2024-06-20T17:48:59.170Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:00.170Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:01.170Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:02.171Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:03.171Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:04.172Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:05.173Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:06.174Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:07.131Z        DEBUG        metrics/metrics_collector.go:68  
>       Adding current status to historical partition data
> 2024-06-20T17:49:07.131Z        DEBUG        scheduler/health_checker.go:179  
>       Scheduler is healthy        {"health check values": 
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for 
> scheduling error entries in metrics","DiagnosisMessage":"There were 0 
> scheduling errors logged in the metrics"},{"Name":"Failed 
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in 
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the 
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check 
> for negative resources in the partitions","DiagnosisMessage":"Partitions with 
> negative resources: []"},{"Name":"Negative 
> resources","Succeeded":true,"Description":"Check for negative resources in 
> the nodes","DiagnosisMessage":"Nodes with negative resources: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a 
> node's allocated resource <= total resource of the 
> node","DiagnosisMessage":"Nodes with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> total partition resource == sum of the node resources from the 
> partition","DiagnosisMessage":"Partitions with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> node total resource = allocated resource + occupied resource + available 
> resource","DiagnosisMessage":"Nodes with inconsistent data: 
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if 
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes 
> with inconsistent data: []"},{"Name":"Reservation 
> check","Succeeded":true,"Description":"Check the reservation nr compared to 
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio: 
> [0.000000]"},{"Name":"Orphan allocation on node 
> check","Succeeded":true,"Description":"Check if there are orphan allocations 
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan 
> allocation on app check","Succeeded":true,"Description":"Check if there are 
> orphan allocations on the 
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:49:07.174Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:08.158Z        DEBUG        security/usergroup.go:96        
> time consumed cleaning the UserGroupCache        {"duration": "2.835µs"}
> 2024-06-20T17:49:08.175Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:08.238Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "7.683µs"}
> 2024-06-20T17:49:09.176Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:10.176Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:11.177Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:12.178Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:13.179Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:13.199Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z        DEBUG        external/scheduler_cache.go:179  
>       Updating node in cache        {"nodeName": "10.10.10.66"}
> 2024-06-20T17:49:13.200Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z        DEBUG        cache/node.go:109        set 
> node capacity        {"nodeID": "10.10.10.66", "capacity": 
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > > 
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\" 
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5926 > > 
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34326439513 > > 
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:2110 > > 
> resources:<key:\"kubernetes.io/mid-memory\" value:<> > 
> resources:<key:\"memory\" value:<value:64713940992 > > 
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\" 
> value:<value:14000 > > "}
> 2024-06-20T17:49:13.200Z        INFO        cache/nodes.go:179        Node's 
> ready status flag        {"Node name": "10.10.10.66", "ready": true}
> 2024-06-20T17:49:13.200Z        INFO        cache/nodes.go:184        report 
> updated nodes to scheduler        {"request": 
> {"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:13.200Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateNodeEvent", "event": 
> {"Request":{"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:49:13.200Z        INFO        objects/queue.go:1190        
> updating root queue max resources        {"current max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]", "new max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:13.200Z        DEBUG        objects/node.go:182        skip 
> updating occupiedResource, not changed
> 2024-06-20T17:49:14.179Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:15.180Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:16.181Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:17.181Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:18.182Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:18.239Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "8.489µs"}
> 2024-06-20T17:49:19.183Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:20.184Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:21.184Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:22.185Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:23.185Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:24.186Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:25.186Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:26.093Z        INFO        objects/application.go:440        
> Placeholder timeout, releasing placeholders        {"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1, 
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z        DEBUG        rmproxy/rmproxy.go:59        
> enqueue event        {"eventType": "*rmevent.RMReleaseAllocationEvent", 
> "event": 
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
>  allocated placeholders on placeholder 
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]}, 
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z        DEBUG        
> callback/scheduler_callback.go:48        UpdateAllocation callback received   
>      {"UpdateAllocationResponse": "released:<partitionName:\"default\" 
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" 
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT 
> message:\"releasing allocated placeholders on placeholder timeout\" 
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z        DEBUG        
> callback/scheduler_callback.go:81        callback: response to released 
> allocations        {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z        DEBUG        cache/context.go:620        
> forget pod        {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:501  
>       Removing assumed pod from cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (ForgetPod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z        DEBUG        objects/application.go:1947      
>   Successfully synced shim on released allocations. response: no. of 
> allocations: 1
> 2024-06-20T17:49:26.093Z        DEBUG        objects/application.go:409       
>  Application placeholder timer cleared        {"AppID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z        INFO        cache/application.go:637        
> try to release pod from application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z        DEBUG        cache/application.go:692        
> trying to send placeholder timeout events to the original pod from 
> application        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app 
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver  
> spark  d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23 
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi 
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver 
> spark-version:3.3.2] 
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb 
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60 
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver 
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\": 
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\": 
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\", 
> \"memory\": \"2Gi\"} }]] [] []  [{okhttp Update v1 2024-06-20 17:48:23 +0000 
> UTC FieldsV1 
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>  
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
>  --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.SparkPi 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar 
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
>  {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi 
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory: 
> {{1975517184 0} {<nil>} 1884Mi 
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
>  "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType": 
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:26.223Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.223Z        DEBUG        external/scheduler_cache.go:411  
>       Putting pod in cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.223Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdatePod.Post)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z        DEBUG        cache/context.go:272        
> Request to update terminated pod, removing from cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.488Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (RemovePod.Pre)        {"nodes": 3, "pods": 58, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57, 
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z        DEBUG        external/scheduler_cache.go:434  
>       Removing deleted pod from cache        {"podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey": 
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z        INFO        general/general.go:170        
> task completes        {"appType": "general", "namespace": "spark", "podName": 
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "podStatus": "Succeeded"}
> 2024-06-20T17:49:26.488Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (RemovePod.Post)        {"nodes": 3, "pods": 57, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56, 
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:26.488Z        DEBUG        cache/context.go:679        
> NotifyTaskComplete        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z        DEBUG        cache/context.go:683        
> release allocation        {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z        DEBUG        cache/task.go:479        prepare 
> to send release request        {"applicationID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "allocationUUID": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "task": "Bound", "terminationType": 
> "TIMEOUT"}
> 2024-06-20T17:49:26.489Z        INFO        cache/task.go:510        
> releasing allocations        {"numOfAsksToRelease": 0, 
> "numOfAllocationsToRelease": 1}
> 2024-06-20T17:49:26.489Z        INFO        cache/task_state.go:380        
> Task state transition        {"app": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task": 
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias": 
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Bound", 
> "destination": "Completed", "event": "CompleteTask"}
> 2024-06-20T17:49:26.489Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateAllocationEvent", 
> "event": 
> {"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"task
>  completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
> 2024-06-20T17:49:26.489Z        INFO        scheduler/partition.go:1245       
>  removing allocation from application        {"appID": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationId": 
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.489Z        DEBUG        ugm/manager.go:124        
> Decreasing resource usage        {"user": "nobody", "queue path": 
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", 
> "resource": "map[memory:2147483648 pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z        DEBUG        ugm/queue_tracker.go:82        
> Decreasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z        DEBUG        ugm/queue_tracker.go:82        
> Decreasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z        DEBUG        ugm/queue_tracker.go:82        
> Decreasing resource usage        {"queue path": "root.spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z        DEBUG        ugm/queue_tracker.go:82        
> Decreasing resource usage        {"queue path": "spark", "application": 
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648 
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z        INFO        scheduler/partition.go:1299       
>  removing allocation from node        {"nodeID": "10.10.10.22", 
> "allocationId": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:27.187Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:27.340Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Pre)        {"nodes": 3, "pods": 57, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56, 
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z        DEBUG        external/scheduler_cache.go:179  
>       Updating node in cache        {"nodeName": "10.10.10.22"}
> 2024-06-20T17:49:27.340Z        DEBUG        external/scheduler_cache.go:558  
>       Scheduler cache state (UpdateNode.Post)        {"nodes": 3, "pods": 57, 
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56, 
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z        DEBUG        cache/node.go:109        set 
> node capacity        {"nodeID": "10.10.10.22", "capacity": 
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > > 
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\" 
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:6957 > > 
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:36311140158 > > 
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > > 
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > > 
> resources:<key:\"memory\" value:<value:64713949184 > > 
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\" 
> value:<value:14000 > > "}
> 2024-06-20T17:49:27.341Z        INFO        cache/nodes.go:179        Node's 
> ready status flag        {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:49:27.341Z        INFO        cache/nodes.go:184        report 
> updated nodes to scheduler        {"request": 
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:27.341Z        DEBUG        scheduler/scheduler.go:96        
> enqueued event        {"eventType": "*rmevent.RMUpdateNodeEvent", "event": 
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
>  "currentQueueSize": 0}
> 2024-06-20T17:49:27.341Z        INFO        objects/queue.go:1190        
> updating root queue max resources        {"current max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]", "new max": 
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0 
> kubernetes.io/batch-cpu:20060 kubernetes.io/batch-memory:109365558213 
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621 
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:27.341Z        DEBUG        objects/node.go:182        skip 
> updating occupiedResource, not changed
> 2024-06-20T17:49:28.188Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:28.240Z        DEBUG        
> scheduler/partition_manager.go:83        time consumed for queue cleaner      
>   {"duration": "9.056µs"}
> 2024-06-20T17:49:29.189Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> 2024-06-20T17:49:30.189Z        DEBUG        scheduler/scheduler.go:157       
>  inspect outstanding requests
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to