[GitHub] [airflow] MammutMKII opened a new issue #19384: Add retries to LivyOperator polling / LivyHook

GitBox Wed, 03 Nov 2021 08:05:10 -0700


MammutMKII opened a new issue #19384:
URL: https://github.com/apache/airflow/issues/19384



   ### Description
   
   Add an optional retry loop to LivyOperator.poll_for_termination() or 
LivyHook.get_batch_state() to improve resiliency against temporary errors. The 
retry counter should reset with successful requests.
   
   ### Use case/motivation
   
   1. Using LivyOperator, we run a Spark Streaming job in a cluster behind Knox 
with LDAP authentication.
   2. While the streaming job is running, LivyOperator keeps polling for 
termination.
   3. In our case, the LDAP service might be unavailable for a few of the 
polling requests per day, resulting in Knox returning an error.
   4. LivyOperator marks the task as failed even though the streaming job 
should still be running, as subsequent polling requests might have revealed.
   5. We would like LivyOperator/LivyHook to send a number of retries in order 
to overcome those brief availability issues.
   
   Workarounds we considered:
   - increase polling interval to reduce the chance of running into an error. 
For reference, we are currently using an interval of 10s
   - use BaseOperator retries to start a new job, only send notification email 
for the final failure. But this would start a new job unnecessarily
   - activate knox authentication caching to decrease the chance of errors 
substantially, but it was causing issues not related to Airflow
   
   ### Related issues
   
   No related issues were found
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] MammutMKII opened a new issue #19384: Add retries to LivyOperator polling / LivyHook

Reply via email to