[
https://issues.apache.org/jira/browse/AIRFLOW-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarek Potiuk updated AIRFLOW-6229:
----------------------------------
Fix Version/s: (was: 2.0.0)
1.10.8
> SparkSubmitOperator polls forever if status json can't find driverState tag
> ---------------------------------------------------------------------------
>
> Key: AIRFLOW-6229
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6229
> Project: Apache Airflow
> Issue Type: New Feature
> Components: scheduler
> Affects Versions: 1.10.6
> Reporter: t oo
> Assignee: t oo
> Priority: Major
> Fix For: 1.10.8
>
>
> You click ‘release’ on a new spark cluster while the prior spark cluster is
> processing some spark submits from airflow. Then airflow is never able to
> finish the sparksubmit task as it polls from status on the new spark cluster
> build which it can’t find status for as the submit happened on earlier spark
> cluster build….the status loop goes on forever
>
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L446]
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L489]
> It loops forever if it can’t find driverState tag in the json response, since
> the new build (pointed to by the released DNS name) doesn’t know about the
> driver submitted (in previously released build) then the 2nd response below
> does not contain the driverState tag.
>
> #response before clicking release on new build
> [ec2-user@reda ~]$
> curl +[http://dns:6066/v1/submissions/status/driver-20191202142207-0000]+
> { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING",
> "serverSparkVersion" : "2.3.4", "submissionId" :
> "driver-20191202142207-0000", "success" : true, "workerHostPort" :
> "reda:31489", "workerId" : "worker-20191202133526-reda-31489"}
>
> #response after clicking release on new build
> [ec2-user@reda ~]$
> curl [http://dns:6066/v1/submissions/status/driver-20191202142207-0000]
> { "action" : "SubmissionStatusResponse", "serverSparkVersion" : "2.3.4",
> "submissionId" : "driver-20191202142207-0000", "success" : false
> }
>
>
> Definitely a defect in current code. Can fix this by modifying
> _process_spark_status_log function to set driver status to UNKNOWN if
> driverState is not in response after iterating all lines.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)