Re: [PR] Update await_pod_completion and affected unit tests [airflow]

via GitHub Wed, 03 Jul 2024 12:23:45 -0700


dstandish commented on PR #40568:
URL: https://github.com/apache/airflow/pull/40568#issuecomment-2207030155


   > I don't disagree. In this case, most of the file was already converted to 
tenacity, it looked to me like whoever did it either did only the easy ones or 
never got around to finishing. I also figured of we do decide to implement the 
tiered backoff idea from the other PR then it's easier to make that change. 
Also also, I just irrationally dislike seeing hardcoded `While True; sleep()` 
loops for some reason. :P
   
   Ok i think i understand better my problem here.  
   
   First, I'm with you on tierd backoff thing and such -- I see the value of 
tenacity for that kind of thing.
   
   And with a strong caveat of "you do you" :) .....
   
   Re just never getting around to finishing conversion to tenacity...  I think 
it's a little different.  The existing uses are a little different in that they 
are retrying after something unexpected happens. E.g. a connection error or 
something on a specific network call.  But the use here is more "waiting for 
something _expected_ to happen".  When you see that decorator there, it 
basically signals, "this method retries for these failures".  But here it 
doesn't mean that.  It _looks_ like it is retrying after error, but really it's 
sort of just a mechanism to wait indefinitely.
   
   And if you wanted to add retry-on-_unexpected_ type of behavior, you'd need 
to stack another decorator.  This of course is no big deal, but just sort of 
illustrates the two diff types of usages.  
   
   In any case, I think I have a suggestion that may be good enough for 
everyone.
   
   What I would suggest is, just move the tenacity part interior to the 
function. 
   
   Something like this
   
   ```python
           for attempt in Retrying(
               
retry=tenacity.retry_if_exception_type(ResourceNotReadyException),
               wait=tenacity.wait_fixed(2),
           ):
               with attempt:
                   remote_pod = self.read_pod(pod)
                   if <pod not ready>
                   self.log.info("Pod %s has phase %s", pod.metadata.name, 
remote_pod.status.phase)
                   raise ResourceNotReadyException
   ```
   
   Then we don't need to decorate the whole function with tenacity, so it's 
clearer that it's a single synchronous wait process that is _not_ actually 
retried.
   
   Since we're on the topic, just want to point out that there is some ugliness 
there currently in our usage of tenacity and I think we may have overdone it a 
bit.  For example I see we have it added on `get_container_names`.  But from 
the looks of it the source of retries in there is `read_pod`, and that itself 
already has tenacity on it!  So instead of retrying 3 times (which the params 
indicate) if read_pod is having trouble, then it will retry 9 times :) 
   
   Again, you do you and happy holiday, even though you are canadian
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Update await_pod_completion and affected unit tests [airflow]

Reply via email to