Taragolis commented on code in PR #26162:
URL: https://github.com/apache/airflow/pull/26162#discussion_r963937927
##########
airflow/providers/docker/hooks/docker.py:
##########
@@ -66,46 +81,51 @@ def __init__(
if not docker_conn_id:
raise AirflowException('No Docker connection id provided')
-
- conn = self.get_connection(docker_conn_id)
-
- if not conn.host:
- raise AirflowException('No Docker URL provided')
- if not conn.login:
- raise AirflowException('No username provided')
- extra_options = conn.extra_dejson
-
+ self.docker_conn_id = docker_conn_id
self.__base_url = base_url
self.__version = version
self.__tls = tls
self.__timeout = timeout
- if conn.port:
- self.__registry = f"{conn.host}:{conn.port}"
- else:
- self.__registry = conn.host
- self.__username = conn.login
- self.__password = conn.password
- self.__email = extra_options.get('email')
- self.__reauth = extra_options.get('reauth') != 'no'
- def get_conn(self) -> APIClient:
+ @cached_property
+ def api_client(self) -> APIClient:
+ """Create connection to docker host and login to the docker
registries. (cached)"""
+ conn = self.get_connection(self.docker_conn_id)
client = APIClient(
base_url=self.__base_url, version=self.__version, tls=self.__tls,
timeout=self.__timeout
)
- self.__login(client)
+
+ credential_helper = conn.extra_dejson.get("credential_helper")
+ if not credential_helper:
+ # If not specified credential helper than retrieve information
from Connection.
+ credential_helper = AirflowConnectionDockerCredentialHelper
+ credential_helper_kwargs = {}
+ else:
+ credential_helper = import_string(credential_helper)
Review Comment:
`conn_id` also part of the DAG, so it should pass the review as well as all
components which DAG expected to use.
Let's imagine that someone change connection after DAG deployed (some one
with Admin or Op role) that could be mainly by two reason
1. Someone with access to change connections compromised their account and
"bad guy" also have access to Airflow which mostly intend to live in private
subnets with access by proxy, bastion/jump hosts and VPNs.
2. Someone with Admin or Op is "bad guy/imposer"
If some kind of exploit already in environment there is no problem to use
Option 2 or might be Option 3 - it also uses import_string for DB Api Hooks.
But also it a lot of Operator expected to run template especially
BashOperator might not be safe if templates stored in Airflow Variables.
'bash_command' + 'env' = run whatever you want.
One side effect of all above - required to much steps and a lot of things
have to happen: docker installed, bash operators uses with Variables,
credentials of airflow leaked, Airflow webserver exposed to the Internet.
Don't get me wrong I've also think that **too much security Is Never
Enough**.
But I believe that more chance that user expose somehow cloud credentials
which use by Airflow (a lot of chances with this creds has access to almost
everything): e.g. some cred info stored in extra field in Connection.
Or also almost all providers which grant ability to send messages e.g.:
Slack, Discord, Telegram and MS Teams (just a PR) has ability to get
token/credentials directly in operator code (without connections) and do not
mask it as secrets (actually it doesn't required because user will store
directly in DAG code). A lot of chance that it expose in logs especially if it
uses HttpHook. Some "bad guy" could easily use this creds for send messages in
corporate messenger - e.g. nice link to change domain user/password.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]