[
https://issues.apache.org/jira/browse/YUNIKORN-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865844#comment-17865844
]
Wilfred Spiegelenburg commented on YUNIKORN-2687:
-------------------------------------------------
I agree with [~blue.tzuhua] this is normal gang scheduling. Using YuniKorn 1.3
and K8s 1.21, you are working with really old releases. The K8s version is out
of support and since 1.3 we have fixed numerous jiras.
Analysis: Looks like K8s never came back with a response that the placeholder
for the driver was released when we tried to:
{code:java}
2024-06-20T17:48:27.093Z INFO objects/application.go:668
ask added successfully to application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta":
"map[memory:1975517184 pods:1 vcore:1000]"}
...
2024-06-20T17:48:27.093Z INFO scheduler/partition.go:828
scheduler replace placeholder processed {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid":
"a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid":
"bdb020e8-708c-4eb7-b48c-fba16155941c"}
...
2024-06-20T17:48:27.094Z INFO cache/application.go:637 try
to release pod from application {"appID":
"spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
"bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType":
"PLACEHOLDER_REPLACED"}
{code}
That means we just wait for that to happen. We cannot do more than that. Looks
like you had an issue on the K8s side...
> Placeholder Timeout and Replacement Failure in Gang Scheduling
> --------------------------------------------------------------
>
> Key: YUNIKORN-2687
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2687
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: huangzhir
> Assignee: Tzu-Hua Lan
> Priority: Blocker
>
> h1. *Description:*
> When using gang scheduling with YuniKorn, the driver pod encounters a
> placeholder timeout, leading to a failure in replacement. The pod shows a
> pending status for approximately 60 seconds.
> h2. *Observed Behavior:*
> * The driver pod ({{{}spark-pi-d86d1d9036b8e8e9-driver{}}}) is queued and
> waiting for allocation.
> * The pod belongs to the {{spark-driver}} task group and is scheduled as a
> gang member.
> * A warning indicating "Placeholder timed out" is logged, and the
> placeholder is not replaced successfully.
> * The pod is eventually assigned and bound to a node, and the task completes.
> * There is a 60-second pending period observed for the driver pod.
> h2. *Pod Status:*
> {code:java}
> kubectl get pod -n spark
> NAME READY STATUS
> RESTARTS AGE
> spark-pi-6d2eea9036f9c838-driver 0/1 Pending 0
> 61s
> tg-spark-driver-spark-b459ba53c0654abe8fe6c7-0 1/1 Terminating 0
> 60s
> tg-spark-executor-spark-b459ba53c0654abe8fe6c7-0 1/1 Running 0
> 60s
> kubectl describe pod spark-pi-6d2eea9036f9c838-driver -n spark
> ......
> Type Reason Age From Message
> ---- ------ ---- ---- -------
> Normal Scheduling 2m52s yunikorn
> spark/spark-pi-d86d1d9036b8e8e9-driver is queued and waiting for allocation
> Normal GangScheduling 2m52s yunikorn Pod belongs to the
> taskGroup spark-driver, it will be scheduled as a gang member
> Warning Placeholder timed out 113s yunikorn Application
> spark-37606583a9174b1886d039c353fe5be5 placeholder has been timed out
> Normal Scheduled 100s yunikorn Successfully assigned
> spark/spark-pi-d86d1d9036b8e8e9-driver to node 10.10.10.66
> Normal PodBindSuccessful 100s yunikorn Pod
> spark/spark-pi-d86d1d9036b8e8e9-driver is successfully bound to node
> 10.10.10.66
> Normal TaskCompleted 50s yunikorn Task
> spark/spark-pi-d86d1d9036b8e8e9-driver is completed
> Normal Pulled 99s kubelet Container image
> "apache/spark:v3.3.2" already present on machine
> Normal Created 99s kubelet Created container
> spark-kubernetes-driver
> Normal Started 99s kubelet Started container
> spark-kubernetes-driver{code}
> h2. *Scheduler Logs:*
> {code:java}
> 2024-06-20T17:49:26.093Z INFO objects/application.go:440
> Placeholder timeout, releasing placeholders {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
> allocated placeholders on placeholder
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
> message:\"releasing allocated placeholders on placeholder timeout\"
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
> Application placeholder timer cleared {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
> trying to send placeholder timeout events to the original pod from
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver
> spark d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
> spark-version:3.3.2]
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\":
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
> UTC FieldsV1
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
> --properties-file /opt/spark/conf/spark.properties --class
> org.apache.spark.examples.SparkPi
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
> {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
> {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests ... {code}
> h2. *Reproduction Steps:*
> Submit the Spark job with the following command:
> {code:java}
> ./bin/spark-submit \
> --master k8s://https://10.10.10.10:6443 \
> --deploy-mode cluster \
> --name spark-pi \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=sparksa \
> --conf spark.kubernetes.namespace=spark \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.executor.instances=1 \
> --conf spark.executor.cores=1 \
> --conf spark.executor.memory=1500m \
> --conf spark.driver.cores=1 \
> --conf spark.driver.memory=1500m \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.kubernetes.driver.limit.memory=2G \
> --conf spark.kubernetes.executor.limit.cores=1 \
> --conf spark.kubernetes.executor.limit.memory=2G \
> --conf spark.kubernetes.driver.label.app=spark \
> --conf spark.kubernetes.executor.label.app=spark \
> --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
> --conf spark.kubernetes.scheduler.name=yunikorn \
> --conf spark.kubernetes.driver.label.queue=root.spark \
> --conf spark.kubernetes.executor.label.queue=root.spark \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
> --conf
> spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
> \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name":
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory":
> "2Gi"},"nodeSelector": {"app": "spark"} }, {"name": "spark-executor",
> "minMember": 1, "minResource": {"cpu": "1", "memory": "2Gi"},"nodeSelector":
> {"app": "spark"} }]' \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard' \
> --conf
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
> \
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar \
> 3000 {code}
> h2. Environment:
> * YuniKorn version: 1.3.0
> * Kubernetes version: 1.21.3
> * Spark version: 3.2.2
> h2. *resource-quota.yaml*
> *None*
> h2. yunikorn-configs.yaml
> {code:java}
> apiVersion: v1
> kind: ConfigMap
> metadata:
> name: yunikorn-configs
> namespace: yunikorn
> data:
> log.level: "-1"
> log.admission.level: "-1"
> log.core.config.level: "-1"
> queues.yaml: |
> partitions:
> - name: default
> placementrules:
> - name: tag
> value: namespace
> create: true
> queues:
> - name: root
> submitacl: '*'
> properties:
> application.sort.policy: fifo
> placeholderTimeoutInSeconds: 60
> schedulingStyle: Hard
> queues:
> - name: spark
> properties:
> application.sort.policy: fifo
> placeholderTimeoutInSeconds: 60
> schedulingStyle: Hard
> resources:
> guaranteed:
> vcore: 10
> memory: 10Gi
> max:
> vcore: 10
> memory: 10Gi {code}
> h2. scheduler logs
> {code:java}
> 2024-06-20T17:48:23.144Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:233
> adding pod to cache {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 55,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":2,"Running":53}}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "spark-pi-0671699036c3885c-driver", "podKey":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "spark-pi-0671699036c3885c-driver", "Namespace": "spark"}
> 2024-06-20T17:48:23.744Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:23.744Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 56,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:737
> AddApplication {"Request":
> {"Metadata":{"ApplicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","QueueName":"root.spark","User":"nobody","Tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
> }]"},"Groups":null,"TaskGroups":[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}},{"name":"spark-executor","minMember":1,"minResource":{"cpu":"1","memory":"2Gi"}}],"OwnerReferences":[{"apiVersion":"v1","kind":"Pod","name":"spark-pi-0671699036c3885c-driver","uid":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","controller":false,"blockOwnerDeletion":true}],"SchedulingPolicyParameters":{},"CreationTime":0}}}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:746 app
> namespace info {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "namespace": "spark"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:773 app
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:23.744Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskState": "New"}
> 2024-06-20T17:48:23.744Z INFO cache/context.go:873 app
> request originating pod added {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "original task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe"}
> 2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
> "Submitted", "event": "SubmitApplication"}
> 2024-06-20T17:48:24.090Z INFO cache/application.go:424
> handle app submission {"app": "applicationID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, queue: root.spark, partition:
> default, totalNumOfTasks: 1, currentState: Submitted", "clusterID":
> "mycluster"}
> 2024-06-20T17:48:24.090Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateApplicationEvent",
> "event":
> {"Request":{"new":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","queueName":"root.spark","partitionName":"[mycluster]default","ugi":{"user":"nobody"},"tags":{"namespace":"spark","yunikorn.apache.org/schedulingPolicyParameters":"placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard","yunikorn.apache.org/task-groups":"[{\"name\":
> \"spark-driver\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }, {\"name\": \"spark-executor\", \"minMember\": 1,
> \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"}
> }]"},"executionTimeoutMilliSeconds":60000,"placeholderAsk":{"resources":{"memory":{"value":4294967296},"pods":{"value":2},"vcore":{"value":2000}}},"gangSchedulingStyle":"Hard"}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z DEBUG placement/placement.go:145
> Executing rule for placing application {"ruleName": "tag",
> "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z DEBUG placement/tag_rule.go:106
> Tag rule intermediate result {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z INFO placement/tag_rule.go:115
> Tag rule application placed {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue": "root.spark"}
> 2024-06-20T17:48:24.090Z DEBUG placement/placement.go:204
> Rule result for placing application {"application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queueName": "root.spark"}
> 2024-06-20T17:48:24.090Z INFO scheduler/context.go:549
> Added application to partition {"applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "partitionName":
> "[mycluster]default", "requested queue": "root.spark", "placed queue":
> "root.spark"}
> 2024-06-20T17:48:24.090Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}],"RejectedApplications":[],"UpdatedApplications":null},
> "currentQueueSize": 0}
> 2024-06-20T17:48:24.090Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "accepted:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" > "}
> 2024-06-20T17:48:24.090Z DEBUG
> callback/scheduler_callback.go:114 callback: response to accepted
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z INFO
> callback/scheduler_callback.go:118 Accepting app {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:24.090Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Submitted",
> "destination": "Accepted", "event": "AcceptApplication"}
> 2024-06-20T17:48:24.145Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:25.091Z DEBUG cache/application.go:516
> postAppAccepted on cached app {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "numTaskGroups": 2,
> "numAllocatedTasks": 0}
> 2024-06-20T17:48:25.091Z INFO cache/application.go:526
> app has taskGroups defined, trying to reserve resources for gang members
> {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb"}
> 2024-06-20T17:48:25.091Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted",
> "destination": "Reserving", "event": "TryReserve"}
> 2024-06-20T17:48:25.102Z INFO cache/placeholder_manager.go:103
> placeholder created {"placeholder": "appID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-driver, podName:
> spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.102Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.102Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskState": "New"}
> 2024-06-20T17:48:25.102Z DEBUG cache/context.go:233
> adding pod to cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 56,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":3,"Running":53}}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:25.102Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z INFO cache/placeholder_manager.go:103
> placeholder created {"placeholder": "appID:
> spark-e1cdb4ac69504b4aacdc9ec74b0322fb, taskGroup: spark-executor, podName:
> spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z DEBUG general/general.go:141
> pod added {"appType": "general", "Name":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "Namespace": "spark"}
> 2024-06-20T17:48:25.105Z DEBUG utils/utils.go:305 Unable
> to retrieve user name from pod labels. Empty user label {"userLabel":
> "yunikorn.apache.org/username"}
> 2024-06-20T17:48:25.105Z DEBUG cache/context.go:841
> AddTask {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z INFO cache/context.go:863 task
> added {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskState": "New"}
> 2024-06-20T17:48:25.105Z DEBUG cache/context.go:233
> adding pod to cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Pre) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":4,"Running":53}}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:25.105Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AddPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:25.146Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Pending",
> "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
> scheduling pod {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/applicationId\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0\" >
> taskGroupName:\"spark-driver\" placeholder:true
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Pending", "destination": "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:275
> scheduling pod {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.092Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/applicationId\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/placeholder\" value:\"true\" >
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0\" >
> taskGroupName:\"spark-executor\" placeholder:true
> preemptionPolicy:<allowPreemptSelf:true allowPreemptOther:true > >
> rmID:\"mycluster\" "}
> 2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-driver","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.092Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/applicationId":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/placeholder":"true","kubernetes.io/label/queue":"root.spark","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"},"taskGroupName":"spark-executor","placeholder":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 1}
> 2024-06-20T17:48:26.092Z INFO objects/application_state.go:133
> Application state transition {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "New", "destination":
> "Accepted", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Accepted","stateTransitionTimestamp":1718905706092997053,"message":"runApplication"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "placeholder": true, "pendingDelta":
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "placeholder": true, "pendingDelta":
> "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> state:\"Accepted\" stateTransitionTimestamp:1718905706092997053
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:137 status update callback received
> {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status":
> "Accepted"}
> 2024-06-20T17:48:26.093Z DEBUG objects/application.go:397
> Application placeholder timer initiated {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
> Increasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:265 Group
> tracker already exists and linking (reusing) the same with application
> {"application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "queue path":
> "root.spark", "user": "nobody", "group": "nobody"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=bdb020e8-708c-4eb7-b48c-fba16155941c,
> allocationKey=470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7, Node=10.10.10.66,
> result=Allocated"}
> 2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
> scheduler allocation processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "uuid":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "allocatedResource":
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true,
> "targetNode": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
> {"RmID":"mycluster","Allocations":[{"allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.66","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-driver","placeholder":true}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse":
> "new:<allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\"
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > nodeID:\"10.10.10.66\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> taskGroupName:\"spark-driver\" placeholder:true > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:53 callback: response to new allocation
> {"allocationKey": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "UUID":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 55,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:476
> Adding assumed pod to cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "node": "10.10.10.66", "allBound":
> true}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.093Z DEBUG scheduler/context.go:853
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.093Z DEBUG cache/task.go:349 bind
> pod volumes {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z INFO cache/context.go:499
> Binding Pod Volumes skipped: all volumes already bound {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.093Z DEBUG cache/task.go:362 bind
> pod {"podName": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0",
> "podUID": "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.093Z INFO client/kubeclient.go:112
> bind pod to node {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "nodeID": "10.10.10.66"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/manager.go:63
> Increasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG ugm/queue_tracker.go:57
> Increasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]"}
> 2024-06-20T17:48:26.093Z DEBUG objects/application.go:339
> Application state timer initiated {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "state": "Starting", "timeout":
> "5m0s"}
> 2024-06-20T17:48:26.093Z INFO objects/application_state.go:133
> Application state transition {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Accepted",
> "destination": "Starting", "event": "runApplication"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMApplicationUpdateEvent",
> "event":
> {"RmID":"mycluster","AcceptedApplications":[],"RejectedApplications":[],"UpdatedApplications":[{"applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","state":"Starting","stateTransitionTimestamp":1718905706093901659,"message":"runApplication"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:26.093Z DEBUG objects/queue.go:1239
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=c834ce9b-c084-4c80-aa38-1f9e4bfe8020,
> allocationKey=4c776fc0-e9c0-4279-806f-8386508d5202, Node=10.10.10.22,
> result=Allocated"}
> 2024-06-20T17:48:26.093Z INFO scheduler/partition.go:888
> scheduler allocation processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "uuid":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "allocatedResource":
> "map[memory:2147483648 pods:1 vcore:1000]", "placeholder": true,
> "targetNode": "10.10.10.22"}
> 2024-06-20T17:48:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMNewAllocationsEvent", "event":
> {"RmID":"mycluster","Allocations":[{"allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","resourcePerAlloc":{"resources":{"memory":{"value":2147483648},"pods":{"value":1},"vcore":{"value":1000}}},"nodeID":"10.10.10.22","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","taskGroupName":"spark-executor","placeholder":true}]},
> "currentQueueSize": 1}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:108 UpdateApplication callback received
> {"UpdateApplicationResponse":
> "updated:<applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> state:\"Starting\" stateTransitionTimestamp:1718905706093901659
> message:\"runApplication\" > "}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:137 status update callback received
> {"appId": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "new status":
> "Starting"}
> 2024-06-20T17:48:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse":
> "new:<allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\"
> resourcePerAlloc:<resources:<key:\"memory\" value:<value:2147483648 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > nodeID:\"10.10.10.22\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> taskGroupName:\"spark-executor\" placeholder:true > "}
> 2024-06-20T17:48:26.094Z DEBUG
> callback/scheduler_callback.go:53 callback: response to new allocation
> {"allocationKey": "4c776fc0-e9c0-4279-806f-8386508d5202", "UUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:476
> Adding assumed pod to cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "node": "10.10.10.22", "allBound":
> true}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (AssumePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.094Z DEBUG scheduler/context.go:853
> Successfully synced shim on new allocation. response: no. of allocations: 1
> 2024-06-20T17:48:26.094Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-06-20T17:48:26.094Z DEBUG cache/task.go:349 bind
> pod volumes {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z INFO cache/context.go:499
> Binding Pod Volumes skipped: all volumes already bound {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.094Z DEBUG cache/task.go:362 bind
> pod {"podName": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0",
> "podUID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.094Z INFO client/kubeclient.go:112
> bind pod to node {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "nodeID": "10.10.10.22"}
> 2024-06-20T17:48:26.097Z INFO cache/task.go:375
> successfully bound pod {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:427
> placeholder is bound {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName":
> "spark-driver"}
> 2024-06-20T17:48:26.098Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7", "taskAlias":
> "spark/tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:375
> successfully bound pod {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:26.098Z INFO cache/task.go:427
> placeholder is bound {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskName":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "taskGroupName":
> "spark-executor"}
> 2024-06-20T17:48:26.098Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source":
> "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-06-20T17:48:26.098Z DEBUG cache/application_state.go:508
> shim app state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "source": "Reserving",
> "destination": "Running", "event": "RunApplication"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.098Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:26.107Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:26.147Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.002Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.014Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
> "spark/spark-pi-0671699036c3885c-driver", "source": "New", "destination":
> "Pending", "event": "InitTask"}
> 2024-06-20T17:48:27.093Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "taskAlias":
> "spark/spark-pi-0671699036c3885c-driver", "source": "Pending", "destination":
> "Scheduling", "event": "SubmitTask"}
> 2024-06-20T17:48:27.093Z DEBUG cache/task.go:275
> scheduling pod {"podName": "spark-pi-0671699036c3885c-driver"}
> 2024-06-20T17:48:27.093Z DEBUG cache/task.go:294 send
> update request {"request":
> "asks:<allocationKey:\"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> resourceAsk:<resources:<key:\"memory\" value:<value:1975517184 > >
> resources:<key:\"pods\" value:<value:1 > > resources:<key:\"vcore\"
> value:<value:1000 > > > maxAllocations:1
> tags:<key:\"kubernetes.io/label/queue\" value:\"root.spark\" >
> tags:<key:\"kubernetes.io/label/spark-app-name\" value:\"spark-pi\" >
> tags:<key:\"kubernetes.io/label/spark-app-selector\"
> value:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\" >
> tags:<key:\"kubernetes.io/label/spark-role\" value:\"driver\" >
> tags:<key:\"kubernetes.io/label/spark-version\" value:\"3.3.2\" >
> tags:<key:\"kubernetes.io/meta/namespace\" value:\"spark\" >
> tags:<key:\"kubernetes.io/meta/podName\"
> value:\"spark-pi-0671699036c3885c-driver\" > taskGroupName:\"spark-driver\"
> Originator:true preemptionPolicy:<allowPreemptSelf:true
> allowPreemptOther:true > > rmID:\"mycluster\" "}
> 2024-06-20T17:48:27.093Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"asks":[{"allocationKey":"d42081ec-a8c4-4fcb-8e40-e4739a67fbfe","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","partitionName":"[mycluster]default","resourceAsk":{"resources":{"memory":{"value":1975517184},"pods":{"value":1},"vcore":{"value":1000}}},"maxAllocations":1,"tags":{"kubernetes.io/label/queue":"root.spark","kubernetes.io/label/spark-app-name":"spark-pi","kubernetes.io/label/spark-app-selector":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","kubernetes.io/label/spark-role":"driver","kubernetes.io/label/spark-version":"3.3.2","kubernetes.io/meta/namespace":"spark","kubernetes.io/meta/podName":"spark-pi-0671699036c3885c-driver"},"taskGroupName":"spark-driver","Originator":true,"preemptionPolicy":{"allowPreemptSelf":true,"allowPreemptOther":true}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z INFO objects/application.go:668
> ask added successfully to application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "ask":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "placeholder": false, "pendingDelta":
> "map[memory:1975517184 pods:1 vcore:1000]"}
> 2024-06-20T17:48:27.093Z DEBUG objects/queue.go:1275
> allocation found on queue {"queueName": "root.spark", "appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocation":
> "applicationID=spark-e1cdb4ac69504b4aacdc9ec74b0322fb,
> uuid=a53e9cbb-931d-4d1d-95e5-8e2425ba95be,
> allocationKey=d42081ec-a8c4-4fcb-8e40-e4739a67fbfe, Node=10.10.10.66,
> result=Replaced"}
> 2024-06-20T17:48:27.093Z INFO scheduler/partition.go:828
> scheduler replace placeholder processed {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationKey":
> "d42081ec-a8c4-4fcb-8e40-e4739a67fbfe", "uuid":
> "a53e9cbb-931d-4d1d-95e5-8e2425ba95be", "placeholder released uuid":
> "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"bdb020e8-708c-4eb7-b48c-fba16155941c","terminationType":4,"message":"replacing
> uuid:
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be","allocationKey":"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"bdb020e8-708c-4eb7-b48c-fba16155941c\"
> terminationType:PLACEHOLDER_REPLACED message:\"replacing uuid:
> a53e9cbb-931d-4d1d-95e5-8e2425ba95be\"
> allocationKey:\"470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7\" > "}
> 2024-06-20T17:48:27.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "bdb020e8-708c-4eb7-b48c-fba16155941c"}
> 2024-06-20T17:48:27.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 2, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.093Z DEBUG scheduler/context.go:884
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:48:27.094Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "bdb020e8-708c-4eb7-b48c-fba16155941c", "terminationType":
> "PLACEHOLDER_REPLACED"}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.099Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.147Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.22"}
> 2024-06-20T17:48:27.290Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.290Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.22", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5963 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34165359566 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
> resources:<key:\"memory\" value:<value:64713949184 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:48:27.290Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:48:27.290Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:48:27.290Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5963},"kubernetes.io/batch-memory":{"value":34165359566},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:48:27.290Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:20075 kubernetes.io/batch-memory:109430536751
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:48:27.290Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":5,"Running":53}}
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-driver-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "470fa7c0-3fa5-4d2a-8ca6-b2af492b8be7"}
> 2024-06-20T17:48:27.421Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 1, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":4,"Running":54}}
> 2024-06-20T17:48:27.905Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:48:27.906Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:48:28.148Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:28.234Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "9.409µs"}
> 2024-06-20T17:48:29.149Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:30.149Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:31.150Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:32.151Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:33.151Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:34.152Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:35.153Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:36.154Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:37.131Z DEBUG scheduler/health_checker.go:179
> Scheduler is healthy {"health check values":
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for
> scheduling error entries in metrics","DiagnosisMessage":"There were 0
> scheduling errors logged in the metrics"},{"Name":"Failed
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check
> for negative resources in the partitions","DiagnosisMessage":"Partitions with
> negative resources: []"},{"Name":"Negative
> resources","Succeeded":true,"Description":"Check for negative resources in
> the nodes","DiagnosisMessage":"Nodes with negative resources:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a
> node's allocated resource <= total resource of the
> node","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> total partition resource == sum of the node resources from the
> partition","DiagnosisMessage":"Partitions with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node total resource = allocated resource + occupied resource + available
> resource","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
> with inconsistent data: []"},{"Name":"Reservation
> check","Succeeded":true,"Description":"Check the reservation nr compared to
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
> [0.000000]"},{"Name":"Orphan allocation on node
> check","Succeeded":true,"Description":"Check if there are orphan allocations
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
> allocation on app check","Succeeded":true,"Description":"Check if there are
> orphan allocations on the
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:48:37.154Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:38.155Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:38.235Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "6.423µs"}
> 2024-06-20T17:48:39.155Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:40.156Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:41.156Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:42.157Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:43.158Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:44.158Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:45.159Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:46.160Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:47.161Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:48.162Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:48.236Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "7.983µs"}
> 2024-06-20T17:48:49.162Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:50.163Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:51.164Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:52.165Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:53.166Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:54.168Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:55.168Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:56.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:57.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:58.169Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:48:58.237Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "8.146µs"}
> 2024-06-20T17:48:59.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:00.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:01.170Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:02.171Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:03.171Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:04.172Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:05.173Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:06.174Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:07.131Z DEBUG metrics/metrics_collector.go:68
> Adding current status to historical partition data
> 2024-06-20T17:49:07.131Z DEBUG scheduler/health_checker.go:179
> Scheduler is healthy {"health check values":
> [{"Name":"Scheduling errors","Succeeded":true,"Description":"Check for
> scheduling error entries in metrics","DiagnosisMessage":"There were 0
> scheduling errors logged in the metrics"},{"Name":"Failed
> nodes","Succeeded":true,"Description":"Check for failed nodes entries in
> metrics","DiagnosisMessage":"There were 0 failed nodes logged in the
> metrics"},{"Name":"Negative resources","Succeeded":true,"Description":"Check
> for negative resources in the partitions","DiagnosisMessage":"Partitions with
> negative resources: []"},{"Name":"Negative
> resources","Succeeded":true,"Description":"Check for negative resources in
> the nodes","DiagnosisMessage":"Nodes with negative resources:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if a
> node's allocated resource <= total resource of the
> node","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> total partition resource == sum of the node resources from the
> partition","DiagnosisMessage":"Partitions with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node total resource = allocated resource + occupied resource + available
> resource","DiagnosisMessage":"Nodes with inconsistent data:
> []"},{"Name":"Consistency of data","Succeeded":true,"Description":"Check if
> node capacity >= allocated resources on the node","DiagnosisMessage":"Nodes
> with inconsistent data: []"},{"Name":"Reservation
> check","Succeeded":true,"Description":"Check the reservation nr compared to
> the number of nodes","DiagnosisMessage":"Reservation/node nr ratio:
> [0.000000]"},{"Name":"Orphan allocation on node
> check","Succeeded":true,"Description":"Check if there are orphan allocations
> on the nodes","DiagnosisMessage":"Orphan allocations: []"},{"Name":"Orphan
> allocation on app check","Succeeded":true,"Description":"Check if there are
> orphan allocations on the
> applications","DiagnosisMessage":"OrphanAllocations: []"}]}
> 2024-06-20T17:49:07.174Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:08.158Z DEBUG security/usergroup.go:96
> time consumed cleaning the UserGroupCache {"duration": "2.835µs"}
> 2024-06-20T17:49:08.175Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:08.238Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "7.683µs"}
> 2024-06-20T17:49:09.176Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:10.176Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:11.177Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:12.178Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:13.179Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:13.199Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.66"}
> 2024-06-20T17:49:13.200Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:13.200Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.66", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:5926 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:34326439513 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:2110 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<> >
> resources:<key:\"memory\" value:<value:64713940992 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:49:13.200Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.66", "ready": true}
> 2024-06-20T17:49:13.200Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:13.200Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.66","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":5926},"kubernetes.io/batch-memory":{"value":34326439513},"kubernetes.io/mid-cpu":{"value":2110},"kubernetes.io/mid-memory":{},"memory":{"value":64713940992},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":4229955584},"pods":{"value":16},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:49:13.200Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19075 kubernetes.io/batch-memory:107280597093
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:13.200Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:49:14.179Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:15.180Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:16.181Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:17.181Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:18.182Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:18.239Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "8.489µs"}
> 2024-06-20T17:49:19.183Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:20.184Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:21.184Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:22.185Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:23.185Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:24.186Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:25.186Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:26.093Z INFO objects/application.go:440
> Placeholder timeout, releasing placeholders {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "placeholders being replaced": 1,
> "releasing placeholders": 1}
> 2024-06-20T17:49:26.093Z DEBUG rmproxy/rmproxy.go:59
> enqueue event {"eventType": "*rmevent.RMReleaseAllocationEvent",
> "event":
> {"RmID":"mycluster","ReleasedAllocations":[{"partitionName":"default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"releasing
> allocated placeholders on placeholder
> timeout","allocationKey":"4c776fc0-e9c0-4279-806f-8386508d5202"}]},
> "currentQueueSize": 0}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:48 UpdateAllocation callback received
> {"UpdateAllocationResponse": "released:<partitionName:\"default\"
> applicationID:\"spark-e1cdb4ac69504b4aacdc9ec74b0322fb\"
> UUID:\"c834ce9b-c084-4c80-aa38-1f9e4bfe8020\" terminationType:TIMEOUT
> message:\"releasing allocated placeholders on placeholder timeout\"
> allocationKey:\"4c776fc0-e9c0-4279-806f-8386508d5202\" > "}
> 2024-06-20T17:49:26.093Z DEBUG
> callback/scheduler_callback.go:81 callback: response to released
> allocations {"UUID": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:26.093Z DEBUG cache/context.go:620
> forget pod {"pod": "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:501
> Removing assumed pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.093Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (ForgetPod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:1947
> Successfully synced shim on released allocations. response: no. of
> allocations: 1
> 2024-06-20T17:49:26.093Z DEBUG objects/application.go:409
> Application placeholder timer cleared {"AppID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "Timeout": "1m0s"}
> 2024-06-20T17:49:26.093Z INFO cache/application.go:637
> try to release pod from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.106Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.109Z DEBUG cache/application.go:692
> trying to send placeholder timeout events to the original pod from
> application {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "app
> request originating pod": "&Pod{ObjectMeta:{spark-pi-0671699036c3885c-driver
> spark d42081ec-a8c4-4fcb-8e40-e4739a67fbfe 7721707 0 2024-06-20 17:48:23
> +0000 UTC <nil> <nil> map[queue:root.spark spark-app-name:spark-pi
> spark-app-selector:spark-e1cdb4ac69504b4aacdc9ec74b0322fb spark-role:driver
> spark-version:3.3.2]
> map[yunikorn.apache.org/app-id:spark-e1cdb4ac69504b4aacdc9ec74b0322fb
> yunikorn.apache.org/schedulingPolicyParameters:placeholderTimeoutInSeconds=60
> gangSchedulingStyle=Hard yunikorn.apache.org/task-group-name:spark-driver
> yunikorn.apache.org/task-groups:[{\"name\": \"spark-driver\", \"minMember\":
> 1, \"minResource\": {\"cpu\": \"1\", \"memory\": \"2Gi\"} }, {\"name\":
> \"spark-executor\", \"minMember\": 1, \"minResource\": {\"cpu\": \"1\",
> \"memory\": \"2Gi\"} }]] [] [] [{okhttp Update v1 2024-06-20 17:48:23 +0000
> UTC FieldsV1
> {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:yunikorn.apache.org/app-id\":{},\"f:yunikorn.apache.org/schedulingPolicyParameters\":{},\"f:yunikorn.apache.org/task-group-name\":{},\"f:yunikorn.apache.org/task-groups\":{}},\"f:labels\":{\".\":{},\"f:queue\":{},\"f:spark-app-name\":{},\"f:spark-app-selector\":{},\"f:spark-role\":{},\"f:spark-version\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"spark-kubernetes-driver\\\"}\":{\".\":{},\"f:args\":{},\"f:env\":{\".\":{},\"k:{\\\"name\\\":\\\"SPARK_APPLICATION_ID\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_CONF_DIR\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_DRIVER_BIND_ADDRESS\\\"}\":{\".\":{},\"f:name\":{},\"f:valueFrom\":{\".\":{},\"f:fieldRef\":{\".\":{},\"f:apiVersion\":{},\"f:fieldPath\":{}}}},\"k:{\\\"name\\\":\\\"SPARK_LOCAL_DIRS\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}},\"k:{\\\"name\\\":\\\"SPARK_USER\\\"}\":{\".\":{},\"f:name\":{},\"f:value\":{}}},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:ports\":{\".\":{},\"k:{\\\"containerPort\\\":4040,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7078,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}},\"k:{\\\"containerPort\\\":7079,\\\"protocol\\\":\\\"TCP\\\"}\":{\".\":{},\"f:containerPort\":{},\"f:name\":{},\"f:protocol\":{}}},\"f:resources\":{\".\":{},\"f:limits\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}},\"f:requests\":{\".\":{},\"f:cpu\":{},\"f:memory\":{}}},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{},\"f:volumeMounts\":{\".\":{},\"k:{\\\"mountPath\\\":\\\"/opt/spark/conf\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}},\"k:{\\\"mountPath\\\":\\\"/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8\\\"}\":{\".\":{},\"f:mountPath\":{},\"f:name\":{}}}}},\"f:dnsPolicy\":{},\"f:enableServiceLinks\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:serviceAccount\":{},\"f:serviceAccountName\":{},\"f:terminationGracePeriodSeconds\":{},\"f:volumes\":{\".\":{},\"k:{\\\"name\\\":\\\"spark-conf-volume-driver\\\"}\":{\".\":{},\"f:configMap\":{\".\":{},\"f:defaultMode\":{},\"f:items\":{},\"f:name\":{}},\"f:name\":{}},\"k:{\\\"name\\\":\\\"spark-local-dir-1\\\"}\":{\".\":{},\"f:emptyDir\":{},\"f:name\":{}}}}}
>
> }]},Spec:PodSpec{Volumes:[]Volume{Volume{Name:spark-local-dir-1,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:&EmptyDirVolumeSource{Medium:,SizeLimit:<nil>,},GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:spark-conf-volume-driver,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:&ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:spark-drv-88f2089036c38c16-conf-map,},Items:[]KeyToPath{KeyToPath{Key:log4j2.properties,Path:log4j2.properties,Mode:*420,},KeyToPath{Key:spark-env.sh,Path:spark-env.sh,Mode:*420,},KeyToPath{Key:spark.properties,Path:spark.properties,Mode:*420,},},DefaultMode:*420,Optional:nil,},VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,Ephemeral:nil,},},Volume{Name:kube-api-access-mqkhd,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:nil,NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:&ProjectedVolumeSource{Sources:[]VolumeProjection{VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:nil,ServiceAccountToken:&ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,},},VolumeProjection{Secret:nil,DownwardAPI:nil,ConfigMap:&ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[]KeyToPath{KeyToPath{Key:ca.crt,Path:ca.crt,Mode:nil,},},Optional:nil,},ServiceAccountToken:nil,},VolumeProjection{Secret:nil,DownwardAPI:&DownwardAPIProjection{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:namespace,FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,Mode:nil,},},},ConfigMap:nil,ServiceAccountToken:nil,},},DefaultMode:*420,},StorageOS:nil,CSI:nil,Ephemeral:nil,},},},Containers:[]Container{Container{Name:spark-kubernetes-driver,Image:apache/spark:v3.3.2,Command:[],Args:[driver
> --properties-file /opt/spark/conf/spark.properties --class
> org.apache.spark.examples.SparkPi
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
> 3000],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:driver-rpc-port,HostPort:0,ContainerPort:7078,Protocol:TCP,HostIP:,},ContainerPort{Name:blockmanager,HostPort:0,ContainerPort:7079,Protocol:TCP,HostIP:,},ContainerPort{Name:spark-ui,HostPort:0,ContainerPort:4040,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:SPARK_USER,Value:root,ValueFrom:nil,},EnvVar{Name:SPARK_APPLICATION_ID,Value:spark-e1cdb4ac69504b4aacdc9ec74b0322fb,ValueFrom:nil,},EnvVar{Name:SPARK_DRIVER_BIND_ADDRESS,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:SPARK_LOCAL_DIRS,Value:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,ValueFrom:nil,},EnvVar{Name:SPARK_CONF_DIR,Value:/opt/spark/conf,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{cpu:
> {{1 0} {<nil>} 1 DecimalSI},memory: {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},Requests:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},memory:
> {{1975517184 0} {<nil>} 1884Mi
> BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:spark-local-dir-1,ReadOnly:false,MountPath:/var/data/spark-c52828ab-6e29-4927-af32-04feeefc62b8,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:spark-conf-volume-driver,ReadOnly:false,MountPath:/opt/spark/conf,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:kube-api-access-mqkhd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Never,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:sparksa,DeprecatedServiceAccount:sparksa,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:yunikorn,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,},Status:PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:Guaranteed,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:411
> Putting pod in cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.223Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdatePod.Post) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:272
> Request to update terminated pod, removing from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0"}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (RemovePod.Pre) {"nodes": 3, "pods": 58,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 57,
> "phases": {"Pending":3,"Running":55}}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:434
> Removing deleted pod from cache {"podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podKey":
> "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z INFO general/general.go:170
> task completes {"appType": "general", "namespace": "spark", "podName":
> "tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "podUID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "podStatus": "Succeeded"}
> 2024-06-20T17:49:26.488Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (RemovePod.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:679
> NotifyTaskComplete {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z DEBUG cache/context.go:683
> release allocation {"appID": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "taskID": "4c776fc0-e9c0-4279-806f-8386508d5202"}
> 2024-06-20T17:49:26.488Z DEBUG cache/task.go:479 prepare
> to send release request {"applicationID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "taskID":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "allocationUUID":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "task": "Bound", "terminationType":
> "TIMEOUT"}
> 2024-06-20T17:49:26.489Z INFO cache/task.go:510
> releasing allocations {"numOfAsksToRelease": 0,
> "numOfAllocationsToRelease": 1}
> 2024-06-20T17:49:26.489Z INFO cache/task_state.go:380
> Task state transition {"app":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "task":
> "4c776fc0-e9c0-4279-806f-8386508d5202", "taskAlias":
> "spark/tg-spark-executor-spark-e1cdb4ac69504b4aacdc9e-0", "source": "Bound",
> "destination": "Completed", "event": "CompleteTask"}
> 2024-06-20T17:49:26.489Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateAllocationEvent",
> "event":
> {"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-e1cdb4ac69504b4aacdc9ec74b0322fb","UUID":"c834ce9b-c084-4c80-aa38-1f9e4bfe8020","terminationType":2,"message":"task
> completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
> 2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1245
> removing allocation from application {"appID":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "allocationId":
> "c834ce9b-c084-4c80-aa38-1f9e4bfe8020", "terminationType": "TIMEOUT"}
> 2024-06-20T17:49:26.489Z DEBUG ugm/manager.go:124
> Decreasing resource usage {"user": "nobody", "queue path":
> "root.spark", "application": "spark-e1cdb4ac69504b4aacdc9ec74b0322fb",
> "resource": "map[memory:2147483648 pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "root.spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z DEBUG ugm/queue_tracker.go:82
> Decreasing resource usage {"queue path": "spark", "application":
> "spark-e1cdb4ac69504b4aacdc9ec74b0322fb", "resource": "map[memory:2147483648
> pods:1 vcore:1000]", "removeApp": false}
> 2024-06-20T17:49:26.489Z INFO scheduler/partition.go:1299
> removing allocation from node {"nodeID": "10.10.10.22",
> "allocationId": "c834ce9b-c084-4c80-aa38-1f9e4bfe8020"}
> 2024-06-20T17:49:27.187Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Pre) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:179
> Updating node in cache {"nodeName": "10.10.10.22"}
> 2024-06-20T17:49:27.340Z DEBUG external/scheduler_cache.go:558
> Scheduler cache state (UpdateNode.Post) {"nodes": 3, "pods": 57,
> "assumed": 0, "pendingAllocs": 0, "inProgressAllocs": 0, "podsAssigned": 56,
> "phases": {"Pending":3,"Running":54}}
> 2024-06-20T17:49:27.340Z DEBUG cache/node.go:109 set
> node capacity {"nodeID": "10.10.10.22", "capacity":
> "resources:<key:\"ephemeral-storage\" value:<value:478652923105 > >
> resources:<key:\"hugepages-1Gi\" value:<> > resources:<key:\"hugepages-2Mi\"
> value:<> > resources:<key:\"kubernetes.io/batch-cpu\" value:<value:6957 > >
> resources:<key:\"kubernetes.io/batch-memory\" value:<value:36311140158 > >
> resources:<key:\"kubernetes.io/mid-cpu\" value:<value:3629 > >
> resources:<key:\"kubernetes.io/mid-memory\" value:<value:2872265011 > >
> resources:<key:\"memory\" value:<value:64713949184 > >
> resources:<key:\"pods\" value:<value:110 > > resources:<key:\"vcore\"
> value:<value:14000 > > "}
> 2024-06-20T17:49:27.341Z INFO cache/nodes.go:179 Node's
> ready status flag {"Node name": "10.10.10.22", "ready": true}
> 2024-06-20T17:49:27.341Z INFO cache/nodes.go:184 report
> updated nodes to scheduler {"request":
> {"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}}
> 2024-06-20T17:49:27.341Z DEBUG scheduler/scheduler.go:96
> enqueued event {"eventType": "*rmevent.RMUpdateNodeEvent", "event":
> {"Request":{"nodes":[{"nodeID":"10.10.10.22","action":2,"attributes":{"ready":"true","si/node-partition":"[mycluster]default"},"schedulableResource":{"resources":{"ephemeral-storage":{"value":478652923105},"hugepages-1Gi":{},"hugepages-2Mi":{},"kubernetes.io/batch-cpu":{"value":6957},"kubernetes.io/batch-memory":{"value":36311140158},"kubernetes.io/mid-cpu":{"value":3629},"kubernetes.io/mid-memory":{"value":2872265011},"memory":{"value":64713949184},"pods":{"value":110},"vcore":{"value":14000}}},"occupiedResource":{"resources":{"memory":{"value":7295991808},"pods":{"value":20},"vcore":{"value":4912}}}}],"rmID":"mycluster"}},
> "currentQueueSize": 0}
> 2024-06-20T17:49:27.341Z INFO objects/queue.go:1190
> updating root queue max resources {"current max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:19066 kubernetes.io/batch-memory:107219777621
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]", "new max":
> "map[ephemeral-storage:1435958769315 hugepages-1Gi:0 hugepages-2Mi:0
> kubernetes.io/batch-cpu:20060 kubernetes.io/batch-memory:109365558213
> kubernetes.io/mid-cpu:10967 kubernetes.io/mid-memory:5430089621
> memory:194141839360 pods:330 vcore:42000]"}
> 2024-06-20T17:49:27.341Z DEBUG objects/node.go:182 skip
> updating occupiedResource, not changed
> 2024-06-20T17:49:28.188Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:28.240Z DEBUG
> scheduler/partition_manager.go:83 time consumed for queue cleaner
> {"duration": "9.056µs"}
> 2024-06-20T17:49:29.189Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> 2024-06-20T17:49:30.189Z DEBUG scheduler/scheduler.go:157
> inspect outstanding requests
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]