huangzhir created YUNIKORN-2687:
-----------------------------------
Summary: Placeholder Timeout and Replacement Failure in Gang
Scheduling
Key: YUNIKORN-2687
URL: https://issues.apache.org/jira/browse/YUNIKORN-2687
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: huangzhir
h1. *Description:*
When using gang scheduling with YuniKorn, the driver pod encounters a
placeholder timeout, leading to a failure in replacement. The pod shows a
pending status for approximately 60 seconds.
h2. *Observed Behavior:*
* The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and
waiting for allocation.
* The pod belongs to the {{spark-driver}} task group and is scheduled as a
gang member.
* A warning indicating "Placeholder timed out" is logged, and the placeholder
is not replaced successfully.
* The pod is eventually assigned and bound to a node, and the task completes.
* There is a 60-second pending period observed for the driver pod.
h2. *Pod Status:*
{code:java}
kubectl get pod -n spark
NAME READY STATUS
RESTARTS AGE
spark-pi-6d2eea9036f9c838-driver 0/1 Pending 0
61s
tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0 1/1 Terminating 0
60s
tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0 1/1 Running 0
60s
kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark
......
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduling 2m52s yunikorn
spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation
Normal GangScheduling 2m52s yunikorn Pod belongs to the taskGroup
spark-driver, it will be scheduled as a gang member
Warning Placeholder timed out 113s yunikorn Application
spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out
Normal Scheduled 100s yunikorn Successfully assigned
spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66
Normal PodBindSuccessful 100s yunikorn Pod
spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node 10.10.10.66
Normal TaskCompleted 50s yunikorn Task
spark/spark-pi-d86d1d9036b8e8e9-driver is completed
Normal Pulled 99s kubelet Container image
"apache/spark:v3.3.2" already present on machine
Normal Created 99s kubelet Created container
spark-kubernetes-driver
Normal Started 99s kubelet Started container
spark-kubernetes-driver{code}
h2. *Scheduler Logs:*
{code:java}
2024-06-20T17:49:26.093Z INFO objects/application.go:440
Placeholder timeout, releasing placeholders {"AppID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
"releasing placeholders": 1}
2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
"event":
{"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
allocated placeholders on placeholder
timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
"currentQueueSize": 0}
2024-06-20T17:49:26.093Z DEBUG callback/scheduler_callback.go:48
UpdateAllocation callback received {"UpdateAllocationResponse":
"released:<partitionName:\"default\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
message:\"releasing allocated placeholders on placeholder timeout\"
allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
2024-06-20T17:49:26.093Z DEBUG callback/scheduler_callback.go:81
callback: response to released allocations {"UUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
2024-06-20T17:49:26.093Z DEBUG cache/context.go:620 forget
pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
Removing assumed pod from cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
Successfully synced shim on released allocations. response: no. of allocations:
1
2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
Application placeholder timer cleared {"AppID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
2024-06-20T17:49:26.093Z INFO cache/application.go:637 try
to release pod from application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
trying to send placeholder timeout events to the original pod from application
{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app request
originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver spark
d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23 +0000 UTC
<nil> <nil> map[queue:root.spark spark-app-name:spark-pi
spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
spark-version:3.3.2]
map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\": 1,
\"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
\"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
\"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
UTC FieldsV1
{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
}]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
--properties-file /opt/spark/conf/spark.properties --class
org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
{{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
{{1975517184 0} {<nil>} 1884Mi
BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
"taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType": "TIMEOUT"}
2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests ... {code}
{{}}
h2. *Reproduction Steps:*
Submit the Spark job with the following command:
{code:java}
./bin/spark-submit \
--master k8s://https://10.10.10.10:6443 \
--deploy-mode cluster \
--name spark-pi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \
--conf spark.kubernetes.namespace=spark \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=1500m \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=1500m \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf spark.kubernetes.driver.limit.memory=2G \
--conf spark.kubernetes.executor.limit.cores=1 \
--conf spark.kubernetes.executor.limit.memory=2G \
--conf spark.kubernetes.driver.label.app=spark \
--conf spark.kubernetes.executor.label.app=spark \
--conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
--conf spark.kubernetes.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.label.queue=root.spark \
--conf spark.kubernetes.executor.label.queue=root.spark \
--conf
spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf
spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
\
--conf
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name":
"spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory":
"2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor",
"minMember": 1, "minResource": {"cpu": "1", "memory": "2Gi"},"nodeSelector":
{"app": "spark"} }]' \
--conf
spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=60
gangSchedulingStyle=Hard' \
--conf
spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
\
local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar \
3000 {code}
h2. Environment:
* YuniKorn version: 1.3.0
* Kubernetes version: 1.21.3
* Spark version: 3.2.2
h2. *resource-quota.yaml*
*None*
h2. yunikorn-configs.yaml
{code:java}
apiVersion: v1
kind: ConfigMap
metadata:
name: yunikorn-configs
namespace: yunikorn
data:
log.level: "-1"
log.admission.level: "-1"
log.core.config.level: "-1"
queues.yaml: |
partitions:
- name: default
placementrules:
- name: tag
value: namespace
create: true
queues:
- name: root
submitacl: '*'
properties:
application.sort.policy: fifo
placeholderTimeoutInSeconds: 60
schedulingStyle: Hard
queues:
- name: spark
properties:
application.sort.policy: fifo
placeholderTimeoutInSeconds: 60
schedulingStyle: Hard
resources:
guaranteed:
vcore: 10
memory: 10Gi
max:
vcore: 10
memory: 10Gi {code}
**
h2. scheduler logs
{code:java}
2024-06-20T17:48:23.144Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:23.744Z DEBUG cache/context.go:233 adding
pod to cache {"podName": "spark-pi-0671699036c3885c-driver"}
2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 55,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":2,"Running":53}}
2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName": "spark-pi-0671699036c3885c-driver",
"podKey": "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
2024-06-20T17:48:23.744Z DEBUG general/general.go:141 pod
added {"appType": "general", "Name": "spark-pi-0671699036c3885c-driver",
"Namespace": "spark"}
2024-06-20T17:48:23.744Z DEBUG utils/utils.go:305 Unable
to retrieve user name from pod labels. Empty user label {"userLabel":
"yunikorn.apache.org/username"}
2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 56,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":3,"Running":53}}
2024-06-20T17:48:23.744Z DEBUG cache/context.go:737
AddApplication {"Request":
{"Metadata":{"ApplicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","QueueName":"root.spark","User":"nobody","Tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
\"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
\"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
\"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
}]"},"Groups":null,"TaskGroups":[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}},{"name":"spark-executor","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}}],"OwnerReferences":[{"apiVersion":"v1","kind":"Pod","name":"spark-pi-0671699036c3885c-driver","uid":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","controller":false,"blockOwnerDeletion":true}],"SchedulingPolicyParameters":{},"CreationTime":0}}}
2024-06-20T17:48:23.744Z DEBUG cache/context.go:746 app
namespace info {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"namespace": "spark"}
2024-06-20T17:48:23.744Z INFO cache/context.go:773 app
added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
2024-06-20T17:48:23.744Z DEBUG cache/context.go:841
AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
2024-06-20T17:48:23.744Z INFO cache/context.go:863 task
added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskState": "New"}
2024-06-20T17:48:23.744Z INFO cache/context.go:873 app
request originating pod added {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "original task":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
shim app state transition {"app":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
"Submitted", "event": "SubmitApplication"}
2024-06-20T17:48:24.090Z INFO cache/application.go:424
handle app submission {"app": "applicationID:
spark-e1cdb4ac69504b4aacdc9ec74b0322fb, queue: root.spark, partition: default,
totalNumOfTasks: 1, currentState: Submitted", "clusterID": "mycluster"}
2024-06-20T17:48:24.090Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateApplicationEvent",
"event":
{"Request":{"new":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","queueName":"root.spark","partitionName":"[mycluster]default","ugi":{"user":"nobody"},"tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
\"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
\"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
\"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
}]"},"executionTimeoutMilliSeconds":60000,"placeholderAsk":{"resources":{"memory":{"value":4294967296},"pods":{"value":2},"vcore":{"value":2000}}},"gangSchedulingStyle":"Hard"}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:48:24.090Z DEBUG placement/placement.go:145
Executing rule for placing application {"ruleName": "tag",
"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
2024-06-20T17:48:24.090Z DEBUG placement/tag_rule.go:106
Tag rule intermediate result {"application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
2024-06-20T17:48:24.090Z INFO placement/tag_rule.go:115
Tag rule application placed {"application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
2024-06-20T17:48:24.090Z DEBUG placement/placement.go:204
Rule result for placing application {"application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queueName": "root.spark"}
2024-06-20T17:48:24.090Z INFO scheduler/context.go:549
Added application to partition {"applicationID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "partitionName":
"[mycluster]default", "requested queue": "root.spark", "placed queue":
"root.spark"}
2024-06-20T17:48:24.090Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
"event":
{"RmID":"mycluster","AcceptedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}],"RejectedApplications":[],"UpdatedApplications":null},
"currentQueueSize": 0}
2024-06-20T17:48:24.090Z DEBUG callback/scheduler_callback.go:108
UpdateApplication callback received {"UpdateApplicationResponse":
"accepted:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > "}
2024-06-20T17:48:24.090Z DEBUG callback/scheduler_callback.go:114
callback: response to accepted application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
2024-06-20T17:48:24.090Z INFO callback/scheduler_callback.go:118
Accepting app {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
shim app state transition {"app":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Submitted", "destination":
"Accepted", "event": "AcceptApplication"}
2024-06-20T17:48:24.145Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:25.091Z DEBUG cache/application.go:516
postAppAccepted on cached app {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "numTaskGroups": 2,
"numAllocatedTasks": 0}
2024-06-20T17:48:25.091Z INFO cache/application.go:526 app
has taskGroups defined, trying to reserve resources for gang members
{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
2024-06-20T17:48:25.091Z DEBUG cache/application_state.go:508
shim app state transition {"app":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted", "destination":
"Reserving", "event": "TryReserve"}
2024-06-20T17:48:25.102Z INFO cache/placeholder_manager.go:103
placeholder created {"placeholder": "appID:
spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-driver, podName:
spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:25.102Z DEBUG general/general.go:141 pod
added {"appType": "general", "Name":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
2024-06-20T17:48:25.102Z DEBUG utils/utils.go:305 Unable
to retrieve user name from pod labels. Empty user label {"userLabel":
"yunikorn.apache.org/username"}
2024-06-20T17:48:25.102Z DEBUG cache/context.go:841
AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:25.102Z INFO cache/context.go:863 task
added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskState": "New"}
2024-06-20T17:48:25.102Z DEBUG cache/context.go:233 adding
pod to cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 56,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":3,"Running":53}}
2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 57,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":4,"Running":53}}
2024-06-20T17:48:25.105Z INFO cache/placeholder_manager.go:103
placeholder created {"placeholder": "appID:
spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-executor, podName:
spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:25.105Z DEBUG general/general.go:141 pod
added {"appType": "general", "Name":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
2024-06-20T17:48:25.105Z DEBUG utils/utils.go:305 Unable
to retrieve user name from pod labels. Empty user label {"userLabel":
"yunikorn.apache.org/username"}
2024-06-20T17:48:25.105Z DEBUG cache/context.go:841
AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:25.105Z INFO cache/context.go:863 task
added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"4c776fc0-e9c0-4279-806f-8386508d5202", "taskState": "New"}
2024-06-20T17:48:25.105Z DEBUG cache/context.go:233 adding
pod to cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 57,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":4,"Running":53}}
2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:25.146Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:26.092Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
"spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
"destination": "Pending", "event": "InitTask"}
2024-06-20T17:48:26.092Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
"destination": "Pending", "event": "InitTask"}
2024-06-20T17:48:26.092Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
"spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Pending",
"destination": "Scheduling", "event": "SubmitTask"}
2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
scheduling pod {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
update request {"request":
"asks:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
value:<value:1000 > > > maxAllocations:1
tags:<key:\"kubernetes.io/label/applicationId\"
value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
tags:<key:\"kubernetes.io/meta/podName\"
value:\"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0\" >
taskGroupName:\"spark-driver\" placeholder:true
preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
rmID:\"mycluster\" "}
2024-06-20T17:48:26.092Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Pending",
"destination": "Scheduling", "event": "SubmitTask"}
2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
scheduling pod {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
update request {"request":
"asks:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
value:<value:1000 > > > maxAllocations:1
tags:<key:\"kubernetes.io/label/applicationId\"
value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
tags:<key:\"kubernetes.io/meta/podName\"
value:\"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0\" >
taskGroupName:\"spark-executor\" placeholder:true
preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
rmID:\"mycluster\" "}
2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
"event":
{"Request":{"asks":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-driver","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
"event":
{"Request":{"asks":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-executor","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
"currentQueueSize": 1}
2024-06-20T17:48:26.092Z INFO objects/application_state.go:133
Application state transition {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
"Accepted", "event": "runApplication"}
2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
"event":
{"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Accepted","stateTransitionTimestamp":1718905706092997053,"message":"runApplication"}]},
"currentQueueSize": 0}
2024-06-20T17:48:26.093Z INFO objects/application.go:668
ask added successfully to application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "placeholder": true, "pendingDelta":
"map[memory:2147483648 pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z INFO objects/application.go:668
ask added successfully to application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
"4c776fc0-e9c0-4279-806f-8386508d5202", "placeholder": true, "pendingDelta":
"map[memory:2147483648 pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:108
UpdateApplication callback received {"UpdateApplicationResponse":
"updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
state:\"Accepted\" stateTransitionTimestamp:1718905706092997053
message:\"runApplication\" > "}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:137
status update callback received {"appId":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status": "Accepted"}
2024-06-20T17:48:26.093Z DEBUG objects/application.go:397
Application placeholder timer initiated {"AppID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
Increasing resource usage {"user": "nobody", "queue path": "root.spark",
"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource":
"map[memory:2147483648 pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:265 Group
tracker already exists and linking (reusing) the same with application
{"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue path":
"root.spark", "user": "nobody", "group": "nobody"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
allocation found on queue {"queueName": "root.spark", "appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
"applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
uuid=bdb020e8-708c-4eb7-b48c-fba16155941c,
allocationKey=470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7, Node=10.10.10.66,
result=Allocated"}
2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
scheduler allocation processed {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "uuid":
"bdb020e8-708c-4eb7-b48c-fba16155941c", "allocatedResource":
"map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true, "targetNode":
"10.10.10.66"}
2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
{"RmID":"mycluster","Allocations":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.66","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-driver","placeholder":true}]},
"currentQueueSize": 0}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:48
UpdateAllocation callback received {"UpdateAllocationResponse":
"new:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
value:<value:1000 > > > nodeID:\"10.10.10.66\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
taskGroupName:\"spark-driver\" placeholder:true > "}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:53
callback: response to new allocation {"allocationKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "UUID":
"bdb020e8-708c-4eb7-b48c-fba16155941c", "applicationID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.66"}
2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:476
Adding assumed pod to cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "node": "10.10.10.66", "allBound": true}
2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.093Z DEBUG scheduler/context.go:853
Successfully synced shim on new allocation. response: no. of allocations: 1
2024-06-20T17:48:26.093Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
"spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Scheduling",
"destination": "Allocated", "event": "TaskAllocated"}
2024-06-20T17:48:26.093Z DEBUG cache/task.go:349 bind pod
volumes {"podName": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0",
"podUID": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:26.093Z INFO cache/context.go:499 Binding
Pod Volumes skipped: all volumes already bound {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.093Z DEBUG cache/task.go:362 bind pod
{"podName": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:26.093Z INFO client/kubeclient.go:112
bind pod to node {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "nodeID": "10.10.10.66"}
2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
Increasing resource usage {"user": "nobody", "queue path": "root.spark",
"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource":
"map[memory:2147483648 pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
Increasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]"}
2024-06-20T17:48:26.093Z DEBUG objects/application.go:339
Application state timer initiated {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "state": "Starting", "timeout":
"5m0s"}
2024-06-20T17:48:26.093Z INFO objects/application_state.go:133
Application state transition {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted", "destination":
"Starting", "event": "runApplication"}
2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
"event":
{"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Starting","stateTransitionTimestamp":1718905706093901659,"message":"runApplication"}]},
"currentQueueSize": 0}
2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
allocation found on queue {"queueName": "root.spark", "appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
"applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
uuid=c834ce9b-c084-4c80-aa38-1f9e4bfe8020,
allocationKey=4c776fc0-e9c0-4279-806f-8386508d5202, Node=10.10.10.22,
result=Allocated"}
2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
scheduler allocation processed {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
"4c776fc0-e9c0-4279-806f-8386508d5202", "uuid":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "allocatedResource":
"map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true, "targetNode":
"10.10.10.22"}
2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
{"RmID":"mycluster","Allocations":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.22","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-executor","placeholder":true}]},
"currentQueueSize": 1}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:108
UpdateApplication callback received {"UpdateApplicationResponse":
"updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
state:\"Starting\" stateTransitionTimestamp:1718905706093901659
message:\"runApplication\" > "}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:137
status update callback received {"appId":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status": "Starting"}
2024-06-20T17:48:26.093Z DEBUG callback/scheduler_callback.go:48
UpdateAllocation callback received {"UpdateAllocationResponse":
"new:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\"
resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
value:<value:1000 > > > nodeID:\"10.10.10.22\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
taskGroupName:\"spark-executor\" placeholder:true > "}
2024-06-20T17:48:26.094Z DEBUG callback/scheduler_callback.go:53
callback: response to new allocation {"allocationKey":
"4c776fc0-e9c0-4279-806f-8386508d5202", "UUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "applicationID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.22"}
2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:476
Adding assumed pod to cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202", "node": "10.10.10.22", "allBound": true}
2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.094Z DEBUG scheduler/context.go:853
Successfully synced shim on new allocation. response: no. of allocations: 1
2024-06-20T17:48:26.094Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
"Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
2024-06-20T17:48:26.094Z DEBUG cache/task.go:349 bind pod
volumes {"podName": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0",
"podUID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:26.094Z INFO cache/context.go:499 Binding
Pod Volumes skipped: all volumes already bound {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.094Z DEBUG cache/task.go:362 bind pod
{"podName": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:26.094Z INFO client/kubeclient.go:112
bind pod to node {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
"4c776fc0-e9c0-4279-806f-8386508d5202", "nodeID": "10.10.10.22"}
2024-06-20T17:48:26.097Z INFO cache/task.go:375
successfully bound pod {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.098Z INFO cache/task.go:427
placeholder is bound {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"taskName": "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0",
"taskGroupName": "spark-driver"}
2024-06-20T17:48:26.098Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
"spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Allocated",
"destination": "Bound", "event": "TaskBound"}
2024-06-20T17:48:26.098Z INFO cache/task.go:375
successfully bound pod {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:26.098Z INFO cache/task.go:427
placeholder is bound {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"taskName": "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0",
"taskGroupName": "spark-executor"}
2024-06-20T17:48:26.098Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
"Allocated", "destination": "Bound", "event": "TaskBound"}
2024-06-20T17:48:26.098Z DEBUG cache/application_state.go:508
shim app state transition {"app":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Reserving", "destination":
"Running", "event": "RunApplication"}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:26.147Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.093Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
"spark/spark-pi-0671699036c3885c-driver", "source": "New", "destination":
"Pending", "event": "InitTask"}
2024-06-20T17:48:27.093Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
"spark/spark-pi-0671699036c3885c-driver", "source": "Pending", "destination":
"Scheduling", "event": "SubmitTask"}
2024-06-20T17:48:27.093Z DEBUG cache/task.go:275
scheduling pod {"podName": "spark-pi-0671699036c3885c-driver"}
2024-06-20T17:48:27.093Z DEBUG cache/task.go:294 send
update request {"request":
"asks:<allocationKey:\"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
resourceAsk:<resources:<key:\"memory\" value:<value:1975517184 > >
resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
value:<value:1000 > > > maxAllocations:1
tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
tags:<key:\"kubernetes.io/label/spark-app-name\" value:\"spark-pi\" >
tags:<key:\"kubernetes.io/label/spark-app-selector\"
value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
tags:<key:\"kubernetes.io/label/spark-role\" value:\"driver\" >
tags:<key:\"kubernetes.io/label/spark-version\" value:\"3.3.2\" >
tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
tags:<key:\"kubernetes.io/meta/podName\"
value:\"spark-pi-0671699036c3885c-driver\" > taskGroupName:\"spark-driver\"
Originator:true preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true
> > rmID:\"mycluster\" "}
2024-06-20T17:48:27.093Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
"event":
{"Request":{"asks":[{"allocationKey":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":1975517184},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/queue":"root.spark","kubernetes.io/label/spark-app-name":"spark-pi","kubernetes.io/label/spark-app-selector":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/spark-role":"driver","kubernetes.io/label/spark-version":"3.3.2","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"spark-pi-0671699036c3885c-driver"},"taskGroupName":"spark-driver","Originator":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:48:27.093Z INFO objects/application.go:668
ask added successfully to application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta":
"map[memory:1975517184 pods:1 vcore:1000]"}
2024-06-20T17:48:27.093Z DEBUG objects/queue.go:1275
allocation found on queue {"queueName": "root.spark", "appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
"applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
uuid=a53e9cbb-931d-4d1d-95e5-8e2425ba95be,
allocationKey=d42081ec-a8c4-4fcb-8e40-e4739a67fbfe, Node=10.10.10.66,
result=Replaced"}
2024-06-20T17:48:27.093Z INFO scheduler/partition.go:828
scheduler replace placeholder processed {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid":
"a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid":
"bdb020e8-708c-4eb7-b48c-fba16155941c"}
2024-06-20T17:48:27.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
"event":
{"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","terminationType":4,"message":"replacing
uuid:
a53e9cbb-931d-4d1d-95e5-8e2425ba95be","allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}]},
"currentQueueSize": 0}
2024-06-20T17:48:27.093Z DEBUG callback/scheduler_callback.go:48
UpdateAllocation callback received {"UpdateAllocationResponse":
"released:<partitionName:\"default\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
terminationType:PLACEHOLDER_REPLACED message:\"replacing uuid:
a53e9cbb-931d-4d1d-95e5-8e2425ba95be\"
allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" > "}
2024-06-20T17:48:27.093Z DEBUG callback/scheduler_callback.go:81
callback: response to released allocations {"UUID":
"bdb020e8-708c-4eb7-b48c-fba16155941c"}
2024-06-20T17:48:27.093Z DEBUG cache/context.go:620 forget
pod {"pod": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
"assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:501
Removing assumed pod from cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.093Z DEBUG scheduler/context.go:884
Successfully synced shim on released allocations. response: no. of allocations:
1
2024-06-20T17:48:27.094Z INFO cache/application.go:637 try
to release pod from application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
"bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType":
"PLACEHOLDER_REPLACED"}
2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.147Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:179
Updating node in cache {"nodeName": "10.10.10.22"}
2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.290Z DEBUG cache/node.go:109 set node
capacity {"nodeID": "10.10.10.22", "capacity":
"resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5963 > >
resources:<key:\"kubernetes.io/batch-memory\" value:<value:34165359566 > >
resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
resources:<key:\"memory\" value:<value:64713949184 > > resources:<key:\"pods\"
value:<value:110 > > resources:<key:\"vcore\" value:<value:14000 > > "}
2024-06-20T17:48:27.290Z INFO cache/nodes.go:179 Node's
ready status flag {"Node name": "10.10.10.22", "ready": true}
2024-06-20T17:48:27.290Z INFO cache/nodes.go:184 report
updated nodes to scheduler {"request":
{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
2024-06-20T17:48:27.290Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
{"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:48:27.290Z INFO objects/queue.go:1190
updating root queue max resources {"current max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:20075 kubernetes.io/batch-memory:109430536751
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]", "new max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]"}
2024-06-20T17:48:27.290Z DEBUG objects/node.go:182 skip
updating occupiedResource, not changed
2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":5,"Running":53}}
2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":4,"Running":54}}
2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":4,"Running":54}}
2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:48:27.906Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:48:28.148Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:28.234Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "9.409µs"}
2024-06-20T17:48:29.149Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:30.149Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:31.150Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:32.151Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:33.151Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:34.152Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:35.153Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:36.154Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:37.131Z DEBUG scheduler/health_checker.go:179
Scheduler is healthy {"health check values": [{"Name":"Scheduling
errors","Succeeded":true,"Description":"Check for scheduling error entries in
metrics","DiagnosisMessage":"There were 0 scheduling errors logged in the
metrics"},{"Name":"Failed nodes","Succeeded":true,"Description":"Check for
failed nodes entries in metrics","DiagnosisMessage":"There were 0 failed nodes
logged in the metrics"},{"Name":"Negative
resources","Succeeded":true,"Description":"Check for negative resources in the
partitions","DiagnosisMessage":"Partitions with negative resources:
[]"},{"Name":"Negative resources","Succeeded":true,"Description":"Check for
negative resources in the nodes","DiagnosisMessage":"Nodes with negative
resources: []"},{"Name":"Consistency of
data","Succeeded":true,"Description":"Check if a node's allocated resource <=
total resource of the node","DiagnosisMessage":"Nodes with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
total partition resource == sum of the node resources from the
partition","DiagnosisMessage":"Partitions with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
node total resource = allocated resource + occupied resource + available
resource","DiagnosisMessage":"Nodes with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
with inconsistent data: []"},{"Name":"Reservation
check","Succeeded":true,"Description":"Check the reservation nr compared to the
number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
[0.000000]"},{"Name":"Orphan allocation on node
check","Succeeded":true,"Description":"Check if there are orphan allocations on
the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
allocation on app check","Succeeded":true,"Description":"Check if there are
orphan allocations on the applications","DiagnosisMessage":"OrphanAllocations:
[]"}]}
2024-06-20T17:48:37.154Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:38.155Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:38.235Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "6.423µs"}
2024-06-20T17:48:39.155Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:40.156Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:41.156Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:42.157Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:43.158Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:44.158Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:45.159Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:46.160Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:47.161Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:48.162Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:48.236Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "7.983µs"}
2024-06-20T17:48:49.162Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:50.163Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:51.164Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:52.165Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:53.166Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:54.168Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:55.168Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:56.169Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:57.169Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:58.169Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:48:58.237Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "8.146µs"}
2024-06-20T17:48:59.170Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:00.170Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:01.170Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:02.171Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:03.171Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:04.172Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:05.173Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:06.174Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:07.131Z DEBUG metrics/metrics_collector.go:68
Adding current status to historical partition data
2024-06-20T17:49:07.131Z DEBUG scheduler/health_checker.go:179
Scheduler is healthy {"health check values": [{"Name":"Scheduling
errors","Succeeded":true,"Description":"Check for scheduling error entries in
metrics","DiagnosisMessage":"There were 0 scheduling errors logged in the
metrics"},{"Name":"Failed nodes","Succeeded":true,"Description":"Check for
failed nodes entries in metrics","DiagnosisMessage":"There were 0 failed nodes
logged in the metrics"},{"Name":"Negative
resources","Succeeded":true,"Description":"Check for negative resources in the
partitions","DiagnosisMessage":"Partitions with negative resources:
[]"},{"Name":"Negative resources","Succeeded":true,"Description":"Check for
negative resources in the nodes","DiagnosisMessage":"Nodes with negative
resources: []"},{"Name":"Consistency of
data","Succeeded":true,"Description":"Check if a node's allocated resource <=
total resource of the node","DiagnosisMessage":"Nodes with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
total partition resource == sum of the node resources from the
partition","DiagnosisMessage":"Partitions with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
node total resource = allocated resource + occupied resource + available
resource","DiagnosisMessage":"Nodes with inconsistent data:
[]"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
with inconsistent data: []"},{"Name":"Reservation
check","Succeeded":true,"Description":"Check the reservation nr compared to the
number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
[0.000000]"},{"Name":"Orphan allocation on node
check","Succeeded":true,"Description":"Check if there are orphan allocations on
the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
allocation on app check","Succeeded":true,"Description":"Check if there are
orphan allocations on the applications","DiagnosisMessage":"OrphanAllocations:
[]"}]}
2024-06-20T17:49:07.174Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:08.158Z DEBUG security/usergroup.go:96
time consumed cleaning the UserGroupCache {"duration": "2.835µs"}
2024-06-20T17:49:08.175Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:08.238Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "7.683µs"}
2024-06-20T17:49:09.176Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:10.176Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:11.177Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:12.178Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:13.179Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:13.199Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:179
Updating node in cache {"nodeName": "10.10.10.66"}
2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:13.200Z DEBUG cache/node.go:109 set node
capacity {"nodeID": "10.10.10.66", "capacity":
"resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5926 > >
resources:<key:\"kubernetes.io/batch-memory\" value:<value:34326439513 > >
resources:<key:\"kubernetes.io/mid-cpu\" value:<value:2110 > >
resources:<key:\"kubernetes.io/mid-memory\" value:<> >
resources:<key:\"memory\" value:<value:64713940992 > > resources:<key:\"pods\"
value:<value:110 > > resources:<key:\"vcore\" value:<value:14000 > > "}
2024-06-20T17:49:13.200Z INFO cache/nodes.go:179 Node's
ready status flag {"Node name": "10.10.10.66", "ready": true}
2024-06-20T17:49:13.200Z INFO cache/nodes.go:184 report
updated nodes to scheduler {"request":
{"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
2024-06-20T17:49:13.200Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
{"Request":{"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:49:13.200Z INFO objects/queue.go:1190
updating root queue max resources {"current max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]", "new max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]"}
2024-06-20T17:49:13.200Z DEBUG objects/node.go:182 skip
updating occupiedResource, not changed
2024-06-20T17:49:14.179Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:15.180Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:16.181Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:17.181Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:18.182Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:18.239Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "8.489µs"}
2024-06-20T17:49:19.183Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:20.184Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:21.184Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:22.185Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:23.185Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:24.186Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:25.186Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:26.093Z INFO objects/application.go:440
Placeholder timeout, releasing placeholders {"AppID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
"releasing placeholders": 1}
2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
"event":
{"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
allocated placeholders on placeholder
timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
"currentQueueSize": 0}
2024-06-20T17:49:26.093Z DEBUG callback/scheduler_callback.go:48
UpdateAllocation callback received {"UpdateAllocationResponse":
"released:<partitionName:\"default\"
applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
message:\"releasing allocated placeholders on placeholder timeout\"
allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
2024-06-20T17:49:26.093Z DEBUG callback/scheduler_callback.go:81
callback: response to released allocations {"UUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
2024-06-20T17:49:26.093Z DEBUG cache/context.go:620 forget
pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
Removing assumed pod from cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
Successfully synced shim on released allocations. response: no. of allocations:
1
2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
Application placeholder timer cleared {"AppID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
2024-06-20T17:49:26.093Z INFO cache/application.go:637 try
to release pod from application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
trying to send placeholder timeout events to the original pod from application
{"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app request
originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver spark
d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23 +0000 UTC
<nil> <nil> map[queue:root.spark spark-app-name:spark-pi
spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
spark-version:3.3.2]
map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\": 1,
\"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
\"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
\"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
UTC FieldsV1
{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
}]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
--properties-file /opt/spark/conf/spark.properties --class
org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
{{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
{{1975517184 0} {<nil>} 1884Mi
BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
"taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType": "TIMEOUT"}
2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:411
Putting pod in cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.488Z DEBUG cache/context.go:272
Request to update terminated pod, removing from cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (RemovePod.Pre) {"nodes": 3, "pods": 58,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
"phases": {"Pending":3,"Running":55}}
2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:434
Removing deleted pod from cache {"podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
"4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.488Z INFO general/general.go:170 task
completes {"appType": "general", "namespace": "spark", "podName":
"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
"4c776fc0-e9c0-4279-806f-8386508d5202", "podStatus": "Succeeded"}
2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (RemovePod.Post) {"nodes": 3, "pods": 57,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
"phases": {"Pending":3,"Running":54}}
2024-06-20T17:49:26.488Z DEBUG cache/context.go:679
NotifyTaskComplete {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.488Z DEBUG cache/context.go:683
release allocation {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
2024-06-20T17:49:26.488Z DEBUG cache/task.go:479 prepare
to send release request {"applicationID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
"4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "allocationUUID":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "task": "Bound", "terminationType":
"TIMEOUT"}
2024-06-20T17:49:26.489Z INFO cache/task.go:510 releasing
allocations {"numOfAsksToRelease": 0, "numOfAllocationsToRelease": 1}
2024-06-20T17:49:26.489Z INFO cache/task_state.go:380 Task
state transition {"app": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
"task": "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
"spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Bound",
"destination": "Completed", "event": "CompleteTask"}
2024-06-20T17:49:26.489Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
"event":
{"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"task
completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1245
removing allocation from application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationId":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
2024-06-20T17:49:26.489Z DEBUG ugm/manager.go:124
Decreasing resource usage {"user": "nobody", "queue path": "root.spark",
"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource":
"map[memory:2147483648 pods:1 vcore:1000]", "removeApp": false}
2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
Decreasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]", "removeApp": false}
2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
Decreasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]", "removeApp": false}
2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
Decreasing resource usage {"queue path": "root.spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]", "removeApp": false}
2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
Decreasing resource usage {"queue path": "spark", "application":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
pods:1 vcore:1000]", "removeApp": false}
2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1299
removing allocation from node {"nodeID": "10.10.10.22", "allocationId":
"c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
2024-06-20T17:49:27.187Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 57,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
"phases": {"Pending":3,"Running":54}}
2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:179
Updating node in cache {"nodeName": "10.10.10.22"}
2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 57,
"assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
"phases": {"Pending":3,"Running":54}}
2024-06-20T17:49:27.340Z DEBUG cache/node.go:109 set node
capacity {"nodeID": "10.10.10.22", "capacity":
"resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:6957 > >
resources:<key:\"kubernetes.io/batch-memory\" value:<value:36311140158 > >
resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
resources:<key:\"memory\" value:<value:64713949184 > > resources:<key:\"pods\"
value:<value:110 > > resources:<key:\"vcore\" value:<value:14000 > > "}
2024-06-20T17:49:27.341Z INFO cache/nodes.go:179 Node's
ready status flag {"Node name": "10.10.10.22", "ready": true}
2024-06-20T17:49:27.341Z INFO cache/nodes.go:184 report
updated nodes to scheduler {"request":
{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
2024-06-20T17:49:27.341Z DEBUG scheduler/scheduler.go:96
enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
{"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
"currentQueueSize": 0}
2024-06-20T17:49:27.341Z INFO objects/queue.go:1190
updating root queue max resources {"current max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]", "new max":
"map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
kubernetes.io/batch-cpu:20060 kubernetes.io/batch-memory:109365558213
kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
memory:194141839360 pods:330 vcore:42000]"}
2024-06-20T17:49:27.341Z DEBUG objects/node.go:182 skip
updating occupiedResource, not changed
2024-06-20T17:49:28.188Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:28.240Z DEBUG scheduler/partition_manager.go:83
time consumed for queue cleaner {"duration": "9.056µs"}
2024-06-20T17:49:29.189Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
2024-06-20T17:49:30.189Z DEBUG scheduler/scheduler.go:157
inspect outstanding requests
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]