XD-DENG commented on code in PR #67118:
URL: https://github.com/apache/airflow/pull/67118#discussion_r3290410720


##########
providers/apache/spark/docs/operators.rst:
##########
@@ -181,3 +181,24 @@ Reference
 """""""""
 
 For further information, look at `Apache Spark submitting applications 
<https://spark.apache.org/docs/latest/submitting-applications.html>`_.
+
+Cluster mode crash recovery (Spark standalone)
+"""""""""""""""""""""""""""""""""""""""""""""""
+
+When running in Spark standalone cluster mode (``--deploy-mode cluster``), the 
Spark driver runs
+independently on the cluster. If the Airflow worker dies while the Spark job 
is running, the driver keeps running but
+Airflow loses track of it and the behaviour to submit a brand new job would be 
wasting
+the compute already done.

Review Comment:
   ```suggestion
   the compute already done or even cause conflict if the Spark job itself is 
not designed to be idempotent.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to