[
https://issues.apache.org/jira/browse/YUNIKORN-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg resolved YUNIKORN-2687.
---------------------------------------------
Resolution: Cannot Reproduce
closing as not reproducible on an old version of YuniKorn
> Placeholder Timeout and Replacement Failure in Gang Scheduling
> --------------------------------------------------------------
>
> Key: YUNIKORN-2687
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2687
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: huangzhir
> Assignee: Tzu-Hua Lan
> Priority: Blocker
>
> h1. *Description:*
> When using gang scheduling with YuniKorn, the driver pod encounters a
> placeholder timeout, leading to a failure in replacement. The pod shows a
> pending status for approximately 60 seconds.
> h2. *Observed Behavior:*
> * The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and
> waiting for allocation.
> * The pod belongs to the {{spark-driver}} task group and is scheduled as a
> gang member.
> * A warning indicating "Placeholder timed out" is logged, and the
> placeholder is not replaced successfully.
> * The pod is eventually assigned and bound to a node, and the task completes.
> * There is a 60-second pending period observed for the driver pod.
> h2. *Pod Status:*
> {code:java}
> kubectl get pod -n spark
> NAME READY STATUS
> RESTARTS AGE
> spark-pi-6d2eea9036f9c838-driver 0/1 Pending 0
> 61s
> tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0 1/1 Terminating 0
> 60s
> tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0 1/1 Running 0
> 60s
> kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark
> ......
> Type Reason Age From Message
> ---- ------ ---- ---- -------
> Normal Scheduling 2m52s yunikorn
> spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation
> Normal GangScheduling 2m52s yunikorn Pod belongs to the
> taskGroup spark-driver, it will be scheduled as a gang member
> Warning Placeholder timed out 113s yunikorn Application
> spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out
> Normal Scheduled 100s yunikorn Successfully assigned
> spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66
> Normal PodBindSuccessful 100s yunikorn Pod
> spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node
> 10.10.10.66
> Normal TaskCompleted 50s yunikorn Task
> spark/spark-pi-d86d1d9036b8e8e9-driver is completed
> Normal Pulled 99s kubelet Container image
> "apache/spark:v3.3.2" already present on machine
> Normal Created 99s kubelet Created container
> spark-kubernetes-driver
> Normal Started 99s kubelet Started container
> spark-kubernetes-driver{code}
> h2. *Scheduler Logs:*
> {code:java}
> 2024-06-20T17:49:26.093Z INFO objects/application.go:440
> Placeholder timeout, releasing placeholders {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
> allocated placeholders on placeholder
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
> message:\"releasing allocated placeholders on placeholder timeout\"
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
> Application placeholder timer cleared {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
> trying to send placeholder timeout events to the original pod from
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver
> spark d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
> spark-version:3.3.2]
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\":
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
> UTC FieldsV1
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
> --properties-file /opt/spark/conf/spark.properties --class
> org.apache.spark.examples.SparkPi
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
> {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
> {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests ... {code}
> h2. *Reproduction Steps:*
> Submit the Spark job with the following command:
> {code:java}
> ./bin/spark-submit \
> --master k8s://https://10.10.10.10:6443 \
> --deploy-mode cluster \
> --name spark-pi \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \
> --conf spark.kubernetes.namespace=spark \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.executor.instances=1 \
> --conf spark.executor.cores=1 \
> --conf spark.executor.memory=1500m \
> --conf spark.driver.cores=1 \
> --conf spark.driver.memory=1500m \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.kubernetes.driver.limit.memory=2G \
> --conf spark.kubernetes.executor.limit.cores=1 \
> --conf spark.kubernetes.executor.limit.memory=2G \
> --conf spark.kubernetes.driver.label.app=spark \
> --conf spark.kubernetes.executor.label.app=spark \
> --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
> --conf spark.kubernetes.scheduler.name=yunikorn \
> --conf spark.kubernetes.driver.label.queue=root.spark \
> --conf spark.kubernetes.executor.label.queue=root.spark \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
> --conf
> spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
> \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name":
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory":
> "2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor",
> "minMember": 1, "minResource": {"cpu": "1", "memory": "2Gi"},"nodeSelector":
> {"app": "spark"} }]' \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard' \
> --conf
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
> \
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar \
> 3000 {code}
> h2. Environment:
> * YuniKorn version: 1.3.0
> * Kubernetes version: 1.21.3
> * Spark version: 3.2.2
> h2. *resource-quota.yaml*
> *None*
> h2. yunikorn-configs.yaml
> {code:java}
> apiVersion: v1
> kind: ConfigMap
> metadata:
> name: yunikorn-configs
> namespace: yunikorn
> data:
> log.level: "-1"
> log.admission.level: "-1"
> log.core.config.level: "-1"
> queues.yaml: |
> partitions:
> - name: default
> placementrules:
> - name: tag
> value: namespace
> create: true
> queues:
> - name: root
> submitacl: '*'
> properties:
> application.sort.policy: fifo
> placeholderTimeoutInSeconds: 60
> schedulingStyle: Hard
> queues:
> - name: spark
> properties:
> application.sort.policy: fifo
> placeholderTimeoutInSeconds: 60
> schedulingStyle: Hard
> resources:
> guaranteed:
> vcore: 10
> memory: 10Gi
> max:
> vcore: 10
> memory: 10Gi {code}
> h2. scheduler logs
> {code:java}
> 2024-06-20T17:48:23.144Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:233
> adding pod to cache {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 55,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":2,"Running":53}}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "spark-pi-0671699036c3885c-driver", "podKey":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "spark-pi-0671699036c3885c-driver", "Namespace": "spark"}
> 2024-06-20T17:48:23.744Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 56,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:737
> AddApplication {"Request":
> {"Metadata":{"ApplicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","QueueName":"root.spark","User":"nobody","Tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
> }]"},"Groups":null,"TaskGroups":[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}},{"name":"spark-executor","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}}],"OwnerReferences":[{"apiVersion":"v1","kind":"Pod","name":"spark-pi-0671699036c3885c-driver","uid":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","controller":false,"blockOwnerDeletion":true}],"SchedulingPolicyParameters":{},"CreationTime":0}}}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:746 app
> namespace info {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "namespace": "spark"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:773 app
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskState": "New"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:873 app
> request originating pod added {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "original task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
> "Submitted", "event": "SubmitApplication"}
> 2024-06-20T17:48:24.090Z INFO cache/application.go:424
> handle app submission {"app": "applicationID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, queue: root.spark, partition:
> default, totalNumOfTasks: 1, currentState: Submitted", "clusterID":
> "mycluster"}
> 2024-06-20T17:48:24.090Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateApplicationEvent",
> "event":
> {"Request":{"new":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","queueName":"root.spark","partitionName":"[mycluster]default","ugi":{"user":"nobody"},"tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
> }]"},"executionTimeoutMilliSeconds":60000,"placeholderAsk":{"resources":{"memory":{"value":4294967296},"pods":{"value":2},"vcore":{"value":2000}}},"gangSchedulingStyle":"Hard"}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z DEBUG placement/placement.go:145
> Executing rule for placing application {"ruleName": "tag",
> "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z DEBUG placement/tag_rule.go:106
> Tag rule intermediate result {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z INFO placement/tag_rule.go:115
> Tag rule application placed {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z DEBUG placement/placement.go:204
> Rule result for placing application {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queueName": "root.spark"}
> 2024-06-20T17:48:24.090Z INFO scheduler/context.go:549
> Added application to partition {"applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "partitionName":
> "[mycluster]default", "requested queue": "root.spark", "placed queue":
> "root.spark"}
> 2024-06-20T17:48:24.090Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}],"RejectedApplications":[],"UpdatedApplications":null},
> "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "accepted:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > "}
> 2024-06-20T17:48:24.090Z DEBUG
> callback/scheduler_callback.go:114 callback: response to accepted
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z INFO
> callback/scheduler_callback.go:118 Accepting app {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Submitted",
> "destination": "Accepted", "event": "AcceptApplication"}
> 2024-06-20T17:48:24.145Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:25.091Z DEBUG cache/application.go:516
> postAppAccepted on cached app {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "numTaskGroups": 2,
> "numAllocatedTasks": 0}
> 2024-06-20T17:48:25.091Z INFO cache/application.go:526
> app has taskGroups defined, trying to reserve resources for gang members
> {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:25.091Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted",
> "destination": "Reserving", "event": "TryReserve"}
> 2024-06-20T17:48:25.102Z INFO cache/placeholder_manager.go:103
> placeholder created {"placeholder": "appID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-driver, podName:
> spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.102Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.102Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskState": "New"}
> 2024-06-20T17:48:25.102Z DEBUG cache/context.go:233
> adding pod to cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 56,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z INFO cache/placeholder_manager.go:103
> placeholder created {"placeholder": "appID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-executor, podName:
> spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.105Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.105Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskState": "New"}
> 2024-06-20T17:48:25.105Z DEBUG cache/context.go:233
> adding pod to cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:25.146Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Pending",
> "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
> scheduling pod {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/applicationId\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0\" >
> taskGroupName:\"spark-driver\" placeholder:true
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Pending", "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
> scheduling pod {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/applicationId\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0\" >
> taskGroupName:\"spark-executor\" placeholder:true
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-driver","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-executor","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 1}
> 2024-06-20T17:48:26.092Z INFO objects/application_state.go:133
> Application state transition {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
> "Accepted", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Accepted","stateTransitionTimestamp":1718905706092997053,"message":"runApplication"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "placeholder": true, "pendingDelta":
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "placeholder": true, "pendingDelta":
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> state:\"Accepted\" stateTransitionTimestamp:1718905706092997053
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:137 status update callback received
> {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status":
> "Accepted"}
> 2024-06-20T17:48:26.093Z DEBUG objects/application.go:397
> Application placeholder timer initiated {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
> Increasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:265 Group
> tracker already exists and linking (reusing) the same with application
> {"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue path":
> "root.spark", "user": "nobody", "group": "nobody"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=bdb020e8-708c-4eb7-b48c-fba16155941c,
> allocationKey=470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7, Node=10.10.10.66,
> result=Allocated"}
> 2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
> scheduler allocation processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "uuid":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "allocatedResource":
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true,
> "targetNode": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
> {"RmID":"mycluster","Allocations":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.66","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-driver","placeholder":true}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse":
> "new:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > nodeID:\"10.10.10.66\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> taskGroupName:\"spark-driver\" placeholder:true > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:53 callback: response to new allocation
> {"allocationKey": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "UUID":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:476
> Adding assumed pod to cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "node": "10.10.10.66", "allBound":
> true}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z DEBUG scheduler/context.go:853
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.093Z DEBUG cache/task.go:349 bind
> pod volumes {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z INFO cache/context.go:499
> Binding Pod Volumes skipped: all volumes already bound {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.093Z DEBUG cache/task.go:362 bind
> pod {"podName": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0",
> "podUID": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z INFO client/kubeclient.go:112
> bind pod to node {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
> Increasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG objects/application.go:339
> Application state timer initiated {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "state": "Starting", "timeout":
> "5m0s"}
> 2024-06-20T17:48:26.093Z INFO objects/application_state.go:133
> Application state transition {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted",
> "destination": "Starting", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Starting","stateTransitionTimestamp":1718905706093901659,"message":"runApplication"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=c834ce9b-c084-4c80-aa38-1f9e4bfe8020,
> allocationKey=4c776fc0-e9c0-4279-806f-8386508d5202, Node=10.10.10.22,
> result=Allocated"}
> 2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
> scheduler allocation processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "uuid":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "allocatedResource":
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true,
> "targetNode": "10.10.10.22"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
> {"RmID":"mycluster","Allocations":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.22","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-executor","placeholder":true}]},
> "currentQueueSize": 1}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> state:\"Starting\" stateTransitionTimestamp:1718905706093901659
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:137 status update callback received
> {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status":
> "Starting"}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse":
> "new:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\"
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > nodeID:\"10.10.10.22\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> taskGroupName:\"spark-executor\" placeholder:true > "}
> 2024-06-20T17:48:26.094Z DEBUG
> callback/scheduler_callback.go:53 callback: response to new allocation
> {"allocationKey": "4c776fc0-e9c0-4279-806f-8386508d5202", "UUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:476
> Adding assumed pod to cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "node": "10.10.10.22", "allBound":
> true}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z DEBUG scheduler/context.go:853
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.094Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.094Z DEBUG cache/task.go:349 bind
> pod volumes {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z INFO cache/context.go:499
> Binding Pod Volumes skipped: all volumes already bound {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.094Z DEBUG cache/task.go:362 bind
> pod {"podName": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0",
> "podUID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z INFO client/kubeclient.go:112
> bind pod to node {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.097Z INFO cache/task.go:375
> successfully bound pod {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:427
> placeholder is bound {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName":
> "spark-driver"}
> 2024-06-20T17:48:26.098Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:375
> successfully bound pod {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:427
> placeholder is bound {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName":
> "spark-executor"}
> 2024-06-20T17:48:26.098Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Reserving",
> "destination": "Running", "event": "RunApplication"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.147Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
> "spark/spark-pi-0671699036c3885c-driver", "source": "New", "destination":
> "Pending", "event": "InitTask"}
> 2024-06-20T17:48:27.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
> "spark/spark-pi-0671699036c3885c-driver", "source": "Pending", "destination":
> "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:27.093Z DEBUG cache/task.go:275
> scheduling pod {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:27.093Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:1975517184 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/label/spark-app-name\" value:\"spark-pi\" >
> tags:<key:\"kubernetes.io/label/spark-app-selector\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/spark-role\" value:\"driver\" >
> tags:<key:\"kubernetes.io/label/spark-version\" value:\"3.3.2\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"spark-pi-0671699036c3885c-driver\" > taskGroupName:\"spark-driver\"
> Originator:true preemptionPolicy:<allowPreemptSelf:true
> allowPreemptOther:true > > rmID:\"mycluster\" "}
> 2024-06-20T17:48:27.093Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":1975517184},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/queue":"root.spark","kubernetes.io/label/spark-app-name":"spark-pi","kubernetes.io/label/spark-app-selector":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/spark-role":"driver","kubernetes.io/label/spark-version":"3.3.2","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"spark-pi-0671699036c3885c-driver"},"taskGroupName":"spark-driver","Originator":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta":
> "map[memory:1975517184 pods:1 vcore:1000]"}
> 2024-06-20T17:48:27.093Z DEBUG objects/queue.go:1275
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=a53e9cbb-931d-4d1d-95e5-8e2425ba95be,
> allocationKey=d42081ec-a8c4-4fcb-8e40-e4739a67fbfe, Node=10.10.10.66,
> result=Replaced"}
> 2024-06-20T17:48:27.093Z INFO scheduler/partition.go:828
> scheduler replace placeholder processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid":
> "a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid":
> "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","terminationType":4,"message":"replacing
> uuid:
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be","allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
> terminationType:PLACEHOLDER_REPLACED message:\"replacing uuid:
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be\"
> allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" > "}
> 2024-06-20T17:48:27.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z DEBUG scheduler/context.go:884
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:48:27.094Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType":
> "PLACEHOLDER_REPLACED"}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.147Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.22"}
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.22", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5963 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34165359566 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
> resources:<key:\"memory\" value:<value:64713949184 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:48:27.290Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:48:27.290Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:48:27.290Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.290Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:20075 kubernetes.io/batch-memory:109430536751
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:48:27.290Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.906Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:48:28.148Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:28.234Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "9.409µs"}
> 2024-06-20T17:48:29.149Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:30.149Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:31.150Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:32.151Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:33.151Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:34.152Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:35.153Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:36.154Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:37.131Z DEBUG scheduler/health_checker.go:179
> Scheduler is healthy {"health check values":
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for
> scheduling error entries in metrics","DiagnosisMessage":"There were 0
> scheduling errors logged in the metrics"},{"Name":"Failed
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check
> for negative resources in the partitions","DiagnosisMessage":"Partitions with
> negative resources: []"},{"Name":"Negative
> resources","Succeeded":true,"Description":"Check for negative resources in
> the nodes","DiagnosisMessage":"Nodes with negative resources:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a
> node's allocated resource <= total resource of the
> node","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> total partition resource == sum of the node resources from the
> partition","DiagnosisMessage":"Partitions with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node total resource = allocated resource + occupied resource + available
> resource","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
> with inconsistent data: []"},{"Name":"Reservation
> check","Succeeded":true,"Description":"Check the reservation nr compared to
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
> [0.000000]"},{"Name":"Orphan allocation on node
> check","Succeeded":true,"Description":"Check if there are orphan allocations
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
> allocation on app check","Succeeded":true,"Description":"Check if there are
> orphan allocations on the
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:48:37.154Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:38.155Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:38.235Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "6.423µs"}
> 2024-06-20T17:48:39.155Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:40.156Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:41.156Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:42.157Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:43.158Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:44.158Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:45.159Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:46.160Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:47.161Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:48.162Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:48.236Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "7.983µs"}
> 2024-06-20T17:48:49.162Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:50.163Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:51.164Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:52.165Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:53.166Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:54.168Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:55.168Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:56.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:57.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:58.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:58.237Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "8.146µs"}
> 2024-06-20T17:48:59.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:00.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:01.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:02.171Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:03.171Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:04.172Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:05.173Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:06.174Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:07.131Z DEBUG metrics/metrics_collector.go:68
> Adding current status to historical partition data
> 2024-06-20T17:49:07.131Z DEBUG scheduler/health_checker.go:179
> Scheduler is healthy {"health check values":
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for
> scheduling error entries in metrics","DiagnosisMessage":"There were 0
> scheduling errors logged in the metrics"},{"Name":"Failed
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check
> for negative resources in the partitions","DiagnosisMessage":"Partitions with
> negative resources: []"},{"Name":"Negative
> resources","Succeeded":true,"Description":"Check for negative resources in
> the nodes","DiagnosisMessage":"Nodes with negative resources:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a
> node's allocated resource <= total resource of the
> node","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> total partition resource == sum of the node resources from the
> partition","DiagnosisMessage":"Partitions with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node total resource = allocated resource + occupied resource + available
> resource","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
> with inconsistent data: []"},{"Name":"Reservation
> check","Succeeded":true,"Description":"Check the reservation nr compared to
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
> [0.000000]"},{"Name":"Orphan allocation on node
> check","Succeeded":true,"Description":"Check if there are orphan allocations
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
> allocation on app check","Succeeded":true,"Description":"Check if there are
> orphan allocations on the
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:49:07.174Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:08.158Z DEBUG security/usergroup.go:96
> time consumed cleaning the UserGroupCache {"duration": "2.835µs"}
> 2024-06-20T17:49:08.175Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:08.238Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "7.683µs"}
> 2024-06-20T17:49:09.176Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:10.176Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:11.177Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:12.178Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:13.179Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:13.199Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.66"}
> 2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.66", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5926 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34326439513 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:2110 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<> >
> resources:<key:\"memory\" value:<value:64713940992 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:49:13.200Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.66", "ready": true}
> 2024-06-20T17:49:13.200Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:13.200Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:49:13.200Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:13.200Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:49:14.179Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:15.180Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:16.181Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:17.181Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:18.182Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:18.239Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "8.489µs"}
> 2024-06-20T17:49:19.183Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:20.184Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:21.184Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:22.185Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:23.185Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:24.186Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:25.186Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:26.093Z INFO objects/application.go:440
> Placeholder timeout, releasing placeholders {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
> allocated placeholders on placeholder
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
> message:\"releasing allocated placeholders on placeholder timeout\"
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
> Application placeholder timer cleared {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
> trying to send placeholder timeout events to the original pod from
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver
> spark d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
> spark-version:3.3.2]
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\":
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
> UTC FieldsV1
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
> --properties-file /opt/spark/conf/spark.properties --class
> org.apache.spark.examples.SparkPi
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
> {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
> {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:272
> Request to update terminated pod, removing from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (RemovePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:434
> Removing deleted pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z INFO general/general.go:170
> task completes {"appType": "general", "namespace": "spark", "podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "podStatus": "Succeeded"}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (RemovePod.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:679
> NotifyTaskComplete {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:683
> release allocation {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z DEBUG cache/task.go:479 prepare
> to send release request {"applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "task": "Bound", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.489Z INFO cache/task.go:510
> releasing allocations {"numOfAsksToRelease": 0,
> "numOfAllocationsToRelease": 1}
> 2024-06-20T17:49:26.489Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Bound",
> "destination": "Completed", "event": "CompleteTask"}
> 2024-06-20T17:49:26.489Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"task
> completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
> 2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1245
> removing allocation from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationId":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.489Z DEBUG ugm/manager.go:124
> Decreasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1299
> removing allocation from node {"nodeID": "10.10.10.22",
> "allocationId": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:27.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.22"}
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.22", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:6957 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:36311140158 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
> resources:<key:\"memory\" value:<value:64713949184 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:49:27.341Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:49:27.341Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:27.341Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:49:27.341Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:20060 kubernetes.io/batch-memory:109365558213
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:27.341Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:49:28.188Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:28.240Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "9.056µs"}
> 2024-06-20T17:49:29.189Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:30.189Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]