[ 
https://issues.apache.org/jira/browse/SPARK-49804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Dyagilev updated SPARK-49804:
-------------------------------------
    Description: 
When deploying Spark pods on Kubernetes with sidecars, the reported executor's 
exit code may be incorrect.

For example, the reported executor's exit code is 0, but the actual is 52 (OOM).
{code:java}
2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
 - Lost executor 1 on XXXXX: The executor with id 1 exited with exit code 
0(success).
  
The API gave the following container statuses:
 
     container name: fluentd
     container image: docker-images-release.XXXXX.com/XXXXX/fluentd:XXXXX
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:34:52Z
     exit code: 0
     termination reason: Completed
 
     container name: istio-proxy
     container image: docker-images-release.XXXXX.com/XXXXX-istio/proxyv2:XXXXX
     container state: running
     container started at: 2024-09-25T02:32:16Z
 
     container name: spark-kubernetes-executor
     container image: docker-dev-artifactory.XXXXX.com/XXXXX/spark-XXXXX:XXXXX
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:35:28Z
     exit code: 52
     termination reason: Error {code}

  was:
When deploying Spark pods on Kubernetes with sidecars, the reported executor's 
exit code may be incorrect.

For example, the reported executor's exit code is 0, but the actual is 52.
{code:java}
2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
 - Lost executor 1 on XXXXX: The executor with id 1 exited with exit code 
0(success).
  
The API gave the following container statuses:
 
     container name: fluentd
     container image: docker-images-release.XXXXX.com/XXXXX/fluentd:XXXXX
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:34:52Z
     exit code: 0
     termination reason: Completed
 
     container name: istio-proxy
     container image: docker-images-release.XXXXX.com/XXXXX-istio/proxyv2:XXXXX
     container state: running
     container started at: 2024-09-25T02:32:16Z
 
     container name: spark-kubernetes-executor
     container image: docker-dev-artifactory.XXXXX.com/XXXXX/spark-XXXXX:XXXXX
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:35:28Z
     exit code: 52
     termination reason: Error {code}


> Incorrect exit code on Kubernetes when deploying with sidecars
> --------------------------------------------------------------
>
>                 Key: SPARK-49804
>                 URL: https://issues.apache.org/jira/browse/SPARK-49804
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.1.1, 3.4.3, 3.5.3
>            Reporter: Oleksiy Dyagilev
>            Priority: Minor
>
> When deploying Spark pods on Kubernetes with sidecars, the reported 
> executor's exit code may be incorrect.
> For example, the reported executor's exit code is 0, but the actual is 52 
> (OOM).
> {code:java}
> 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
> org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
>  - Lost executor 1 on XXXXX: The executor with id 1 exited with exit code 
> 0(success).
>   
> The API gave the following container statuses:
>  
>      container name: fluentd
>      container image: docker-images-release.XXXXX.com/XXXXX/fluentd:XXXXX
>      container state: terminated
>      container started at: 2024-09-25T02:32:17Z
>      container finished at: 2024-09-25T02:34:52Z
>      exit code: 0
>      termination reason: Completed
>  
>      container name: istio-proxy
>      container image: 
> docker-images-release.XXXXX.com/XXXXX-istio/proxyv2:XXXXX
>      container state: running
>      container started at: 2024-09-25T02:32:16Z
>  
>      container name: spark-kubernetes-executor
>      container image: docker-dev-artifactory.XXXXX.com/XXXXX/spark-XXXXX:XXXXX
>      container state: terminated
>      container started at: 2024-09-25T02:32:17Z
>      container finished at: 2024-09-25T02:35:28Z
>      exit code: 52
>      termination reason: Error {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to