teslur opened a new issue, #35190: URL: https://github.com/apache/airflow/issues/35190
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened When I ran [this example code](https://github.com/apache/airflow/blob/main/tests/system/providers/google/cloud/compute/example_compute_ssh.py) workflow, the SSH tunneling command provided by `ComputeEngineSSHHook` was executed under a different account than the service account associated with the Connection specified by the `gcp_conn_id` argument. `GoogleBaseHook`, the parent class of `ComputeEngineHook`, which `ComputeEngineSSHHook` uses to internally execute SSH tunneling commands, does not implement the process to handle Connection using `"Keyfile Secret Name (in GCP Secret Manager)"`. ### What you think should happen instead I think the SSH tunneling command should be executed with the privileges of the service account associated with the specified `gcp_conn_id` Connection. ### How to reproduce Based on [this code example](https://github.com/apache/airflow/blob/main/tests/system/providers/google/cloud/compute/example_compute_ssh.py), make the following changes: 1. Link service account X to the GCE instance where airflow worker is running. 2. Create a key for service account Y and register the key JSON in Secret Manager. 3. Grant access privileges to the Secret Manager registered in 2. to service account X. 4. Grant IAP tunneling privileges to service account Y, and also allow SSH routes from the IAP server to the GCE instance created with the `gce_instance_insert` task. Make sure service account X does not have permissions for IAP tunneling. 5. Create a Connection of type `google_cloud_platform` and specify that `"Keyfile Secret Name"` and `"Keyfile Secret Project"` point to the Secret Manager created in 2. 6. Add the `gcp_conn_id` argument to `ComputeEngineSSHHook` specified in the `ssh_hook` argument of the `metadata_without_iap_tunnel1` task, and specify the Connection created in 5. 7. When I run a workflow, IAP tunneling keeps failing with timeout errors. !!!Note!!! For exact reproduction, in addition to the above changes, you will need to monkey patch the points mentioned in the discussion below. https://github.com/apache/airflow/discussions/35173 Without this monkey patch, even if `GoogleBaseHook` obtains the correct authentication information, the SSH tunneling process will not use the obtained authentication information. ### Operating System Ubuntu ### Versions of Apache Airflow Providers ```sh $ pip freeze | grep apache-airflow-providers apache-airflow-providers-amazon==8.6.0 apache-airflow-providers-celery==3.3.3 apache-airflow-providers-cncf-kubernetes==7.5.0 apache-airflow-providers-common-sql==1.7.1 apache-airflow-providers-daskexecutor==1.0.1 apache-airflow-providers-docker==3.7.4 apache-airflow-providers-elasticsearch==5.0.1 apache-airflow-providers-ftp==3.5.1 apache-airflow-providers-google==10.7.0 apache-airflow-providers-grpc==3.2.2 apache-airflow-providers-hashicorp==3.4.2 apache-airflow-providers-http==4.5.1 apache-airflow-providers-imap==3.3.1 apache-airflow-providers-microsoft-azure==6.3.0 apache-airflow-providers-mysql==5.3.0 apache-airflow-providers-odbc==4.0.0 apache-airflow-providers-openlineage==1.0.2 apache-airflow-providers-postgres==5.6.0 apache-airflow-providers-redis==3.3.1 apache-airflow-providers-sendgrid==3.2.2 apache-airflow-providers-sftp==4.6.0 apache-airflow-providers-slack==8.0.0 apache-airflow-providers-snowflake==5.0.0 apache-airflow-providers-sqlite==3.4.3 apache-airflow-providers-ssh==3.7.2 ``` ### Deployment Docker-Compose ### Deployment details The version of the airflow is: ```sh $ airflow version 2.7.1 ``` The Dockerfile of the image that used in compose is: ```Dockerfile FROM apache/airflow:2.7.1-python3.8 ENV CONSTRAINT_URL "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.1/constraints-3.8.txt" COPY --chown=50000:0 requirements.txt "${AIRFLOW_HOME}/requirements.txt" USER root RUN echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - RUN apt-get update \ && apt-get install -y --no-install-recommends \ build-essential \ libpq-dev \ google-cloud-cli \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* USER airflow RUN pip install --upgrade pip \ && pip install \ --no-cache-dir \ --user \ --requirement requirements.txt \ --constraint "${CONSTRAINT_URL}" ``` ### Anything else In my environment, I applied the following patch to avoid this issue. https://github.com/teslur/airflow/compare/543db7004ee593605e250265b0722917cef296d3...a14dc62ffb0bbdb95b3dc3527b06bb79e9f11e6c On the other hand, the above patch also makes a one-line modification to the `._connect_to_instance()` method of `ComputeEngineSSHHook`. Without this modification, to explain using the example shown in the reproduction procedure, when the SSH connection is retried due to timeout etc., the request to obtain the key JSON from Secret Manager will be executed with the privileges of service account Y. This request is expected to be executed with the privileges of service account X granted to the GCE instance, and is acting as such for the first request. It seems that the effect of entering the context with the `.connect()` method of the `_GCloudAuthorizedSSHClient` class remains even when retrying, but I do not know why this happens. The above patch handles this by explicitly closing the connection and exiting the context when an SSH exception is caught. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
