VladaZakharova commented on PR #32365:
URL: https://github.com/apache/airflow/pull/32365#issuecomment-1642001444

   Hi @potiuk !
   Thank you for your ideas. I have rechecked the flow and changed the logic in 
the code:
   now, when catching specific types of exceptions, the code will retry 
specified action once again until the maximum number of retries will be reached.
   The decision about specific types of exceptions to be caught was made by 
investigating behavior of the connection in different cases:
   
   - if os_login=False, then the instance metadata is used for authentication. 
In this case, when multiple parallel connections are trying to retrieve/change 
the metadata in API calls, the 412 error is thrown (either Airflow exception 
with 412 error message or HttpError(412) )
   
   - if os_login=True, then the os_login is used for authentication. Multiple 
connections at the same time lead to SSHException trying to connect, the 
SSHException from paramiko library is thrown.
   
   I have also modified method `_connect_to_instance()` to follow current 
logic. It was running infinitive loop until the connection was actually 
established. Now this logic is changed to use only 5 retries at a time. Then, 
if the connection is still not established, the process of generating and 
importing ssh public key will be repeated. It will ensure that no process is 
not stuck with the same public key trying to authenticate and failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to