juergbi commented on PR #1901:
URL: https://github.com/apache/buildstream/pull/1901#issuecomment-2007962394

   With a reliable network connection to an available RE server, the client 
should never get the UNAVAILABLE error, even with non-trivial usage. However, 
we should indeed retry on UNAVAILABLE to improve operation when a server is 
temporarily unavailable for any reason.
   
   The retry logic is far from optimal, with and without this patch. E.g., 
there is no retry if the first response of `Execute()` or `WaitExecution()` 
fails. And there is no retry delay with backoff or a maximum number of 
attempts. If the initial `Execute()` response is received but there is an error 
later on, it tries to immediately resume the operation stream by calling 
`WaitExecution()`. If that fails, it will give up (as `last_operation` will be 
`None`). If there is a temporary network or server issue, it is likely that the 
immediate retry fails as well.
   
   That said, even the immediate retry on UNAVAILABLE likely helps in some 
cases. The only thing, that I can think of, where this patch makes anything 
worse might be to not report the "Failed contacting remote execution server" in 
case of UNAVAILABLE being fatal (i.e., either it failed already before the 
first response of `Execute()` in which case there will be no retry, or the 
immediate retry fails as well). If `last_operation is None`, we should still 
raise that error, at least as long as we don't retry in that case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to