ywan2017 opened a new pull request #9081:
URL: https://github.com/apache/airflow/pull/9081


   #8963 
   
   ## Description
   
   I am using airflow SparkSubmitOperator to schedule my spark jobs on 
kubernetes cluster. 
   
   But for some reason, kubernetes often throw 'too old resource version' 
exception which will interrupt spark watcher, then airflow will lost the log 
stream and could not get 'Exit Code' eventually. So airflow will mark job 
failed once log stream lost but the job is still running.
   
   This is  a solution about a simple retry mechanism which is when the log 
stream is interrupted, then call  method  'read_namespaced_pod()', which is 
provided by kubernetes client api,  to get spark driver pod status.
   
   ## Target Github ISSUE
   
   https://github.com/apache/airflow/issues/8963
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ ] Description above provides context of the change
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Target Github ISSUE in description if exists
   - [ ] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to