GitHub user hezeclark added a comment to the discussion: Airflow task failed but spark kube app is running
This is a common issue with Spark on Kubernetes in Airflow. The Airflow task marks as failed, but the Spark application continues running in the cluster. This happens because Spark on Kubernetes runs asynchronously by default. Here are the root causes and fixes: **Root cause 1: Using SparkSubmitOperator without proper failure handling** The SparkSubmitOperator submits the job but may timeout or lose connection before the Spark app finishes: **Root cause 2: KubernetesPodOperator without proper cleanup** **Fix for orphaned Spark apps:** Add a cleanup task that runs even when the main task fails: Are you using SparkKubernetesOperator (from apache-airflow-providers-cncf-kubernetes), SparkSubmitOperator, or a custom operator? GitHub link: https://github.com/apache/airflow/discussions/63298#discussioncomment-16113619 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
