VladaZakharova commented on PR #32365: URL: https://github.com/apache/airflow/pull/32365#issuecomment-1642001444
Hi @potiuk ! Thank you for your ideas. I have rechecked the flow and changed the logic in the code: now, when catching specific types of exceptions, the code will retry specified action once again until the maximum number of retries will be reached. The decision about specific types of exceptions to be caught was made by investigating behavior of the connection in different cases: - if os_login=False, then the instance metadata is used for authentication. In this case, when multiple parallel connections are trying to retrieve/change the metadata in API calls, the 412 error is thrown (either Airflow exception with 412 error message or HttpError(412) ) - if os_login=True, then the os_login is used for authentication. Multiple connections at the same time lead to SSHException trying to connect, the SSHException from paramiko library is thrown. I have also modified method `_connect_to_instance()` to follow current logic. It was running infinitive loop until the connection was actually established. Now this logic is changed to use only 5 retries at a time. Then, if the connection is still not established, the process of generating and importing ssh public key will be repeated. It will ensure that no process is not stuck with the same public key trying to authenticate and failing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
