Taragolis commented on PR #26162:
URL: https://github.com/apache/airflow/pull/26162#issuecomment-1250873799
@potiuk Right now I thought this is only one approach which could cover:
- Auth live in provider packages or user-defined scripts
- Do not use `import_string`
It is only required small changes in current operators and do not need
re-implement everything in case if it required only the way of Auth. This PR
also in draft by the one reason I do not have an idea the best place where
write the documentation how to use Auth.
It might be:
- In docker provider just information how to implements custom and what
actually DockerHook send to this class
- In amazon provider information about what user should provide in ECR
related class
And also I would like to add some information about other Hooks and
something which might (or might not) be nixe implemented in the future. Which
not related to current PR
---
### HttpHook
Right now my daily workload do not required but couple years ago we were
using it very active. I do not remember might be in 1.10.4 HttpHook do not even
have this parameter for request Auth (today I lazy to check it).
But major issue currently you need to create new hook and overwrite
`get_conn` method if you need provide not only `login` and `password` from
connection. So may be it also good idea to implement some generic way to grab
credentials from Connection and provide into HttpHook, so it wouldn't require
re-create hook and operators in cases if only required custom Auth
I found this code from the old project (note: code created for 1.10.x)
#### Custom Auth
```python
class BearerAuth(AuthBase):
""" Bearer Authorization implementation for requests """
def __init__(self, token):
self._token = token
def __call__(self, r):
r.headers["Authorization"] = "Bearer %s" % self._token
return r
```
```python
class SomeDumbAuth(AuthBase):
""" Authorization for some private API """
def __init__(self, key, secret):
self._key = key
self._secret = secret
def __call__(self, r):
if not r.body:
r.body = ""
if r.body:
b = {x.split("=")[0]: x.split("=")[1] for x in r.body.split("&")}
else:
b = {}
b.update({
"some_key": self._key,
"some_secret": self._secret,
})
r.body = "&".join("%s=%s" % (k, v) for k,v in b.items())
return r
```
```python
class AppStoreAuth(AuthBase):
""" AppStore Authorization implementation for requests """
def __init__(self, private_key, key_id, issuer_id):
self._private_key = private_key
self._key_id = key_id
self._issuer_id = issuer_id
def __call__(self, r):
headers = {
"alg": 'ES256',
"kid": self._key_id,
"typ": "JWT"
}
payload = {
"iss": self._issuer_id,
"exp": int((datetime.now() +
timedelta(minutes=20)).strftime("%s")),
"aud": "appstoreconnect-v1"
}
token = jwt.encode(
payload=payload,
key=self._private_key,
algorithm='ES256',
headers=headers
).decode(encoding="utf-8")
r.headers["Authorization"] = "Bearer %s" % token
return r
```
#### Hook which use custom auth from conn
```python
class AppStoreSalesHttpHook(HttpHook):
""" HTTP Hook for AppStore connect API """
def __init__(self, endpoint, *args, **kwargs):
super().__init__(*args, **kwargs)
self.endpoint = endpoint
def get_conn(self, headers: dict = None):
""" Returns http requests session for AppStore connect use with
requests
Args:
headers: additional headers to be passed through as a dictionary
"""
session = requests.Session()
if self.http_conn_id:
conn = self.get_connection(self.http_conn_id)
if conn.password:
private_key = conn.password.replace("\\n", "\n")
key_id = conn.extra_dejson.get('KeyId')
issuer_id = conn.extra_dejson.get('IssuerId')
session.auth = AppStoreAuth(
private_key=private_key,
key_id=key_id,
issuer_id=issuer_id,
)
else:
raise ValueError("Missing extra parameters for connection
%r" % self.http_conn_id)
if conn.host and "://" in conn.host:
self.base_url = conn.host
else:
# schema defaults to HTTP
schema = conn.schema if conn.schema else "http"
host = conn.host if conn.host else ""
self.base_url = schema + "://" + host
if conn.port:
self.base_url = self.base_url + ":" + str(conn.port)
if headers:
session.headers.update(headers)
return session
```
---
### PostgresHook
It is not a big deal to change in hook to use custom Auth. One thing what I
would change it drop Redshift support from PostgresHook
https://github.com/apache/airflow/blob/6045f7ad697e2bdb934add1a8aeae5a817306b22/airflow/providers/postgres/hooks/postgres.py#L191-L205
In Amazon provider there is two different ways (and two different hooks) how
to interact with Redshift
1. DB-API by `redshift-connector`
2. By AWS API and boto3
Event thought Redshift use PostgreSQL protocol I'm not sure that `psycopg2`
officially support Redshift or not but `psycopg` (v3) not officially supported
it - https://github.com/psycopg/psycopg/issues/122 .
---
### MySQL
I'm still not sure that the current implementation actually allow use AWS
IAM (need to check).
But also for MySQL Auth class need to have ability for specify which driver
user actually use
https://github.com/apache/airflow/blob/6045f7ad697e2bdb934add1a8aeae5a817306b22/airflow/providers/mysql/hooks/mysql.py#L169-L179
---
### Extensible Connection / Pluggable Auth
In my head right now this part a bit more than just Auth.
Currently Airflow Connection might use for different things.
- For authentication
- For configuration client
- Parameters for API call (like Amazon EMR Connection, which actually not a
connection and use only in one place)
The main problem how it showed in the UI and how it stored. Good sample is
AWS Connection:
- Quite a few different method and extra parameters which actually won't
work together
- boto3 configurations
Extensible Connection might show different ui depend on select Auth Type,
also it would be nice have different tabs for Auth and Configurations.
But again it is just an idea which in my head might be brilliant but in real
world required a lot of changes.
Also I'm not even try to draw some scratches on paper how it would work and
integrate with airflow and components.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]