dstandish commented on a change in pull request #18447:
URL: https://github.com/apache/airflow/pull/18447#discussion_r720720951
##########
File path: airflow/providers/amazon/aws/hooks/redshift.py
##########
@@ -126,3 +133,91 @@ def create_cluster_snapshot(self, snapshot_identifier:
str, cluster_identifier:
ClusterIdentifier=cluster_identifier,
)
return response['Snapshot'] if response['Snapshot'] else None
+
+
+class RedshiftSQLHook(DbApiHook):
+ """
+ Execute statements against Amazon Redshift, using redshift_connector
+
+ This hook requires the redshift_conn_id connection. This connection must
+ be initialized with the schema. Additional connection
+ options can be passed to extra as a JSON string.
Review comment:
I think it would be best to point to the connection howto at
`_howto/connection:redshift` here and don't address connection details.
One thing that's confusing about the wording here is that `schema` is
ambiguous because you really mean `Connection` schema and redshift _database_.
And while you could update the wording here, I think better is to just make
sure that usage is clear in the howto and point them there -- no need to
duplicate.
In the howto, I would add a simple example of connection instantiation in
the most common one or two scenarios (e.g. db auth and iam). Something that
wasn't obvious to me is that when trying IAM is you actually have to use
`db_user` and not `user`. An example for this would help.
Additionally, instead of forcing user to use conn attrs (in this case
conn.schema) I would recommend allowing the user to connection params in either
extra or conn attributes.
So in this case where you have `conn.schema` required, better would be to
allow `extra_dejson['database'] or conn.schema`. And while you could add
parameter validation logic for that, I think it's better to just defer to the
library on that, especially since all but one are only _conditionally_ required.
With this library, having ambiguity both with database vs schema and more
importantly with user vs db_user vs login, I think it may be simplest for users
to just dump everything in extra and therefore we should make sure to support
that.
So what does this boil down to in terms of changes...
I think it's best to simply remove the `Connection.schema` requirement (and
let the library complain when it doesn't have the right kwargs). Then if
conn.schema is not provided, but `database` is provided in extra, it will still
work (and I think your code already supports this, once you remove the
parameter validation). And same with `conn.login` and `user` -- let extra
override conn attr.
And could you please add tests that verify the connection parsing behavior
-- i.e. what happens when passed in conn attr and conn extra, or when not in
attr but yes in extra, and when passed in both.
Thanks for bearing with the iteration on this one... getting to know some
more of the details as we go, and we're definitely getting closer.
---
Separately.... I just want to mention that it might be helpful if, in
redshift_connector, you raise a more helpful error message when the parameter
combination is bad in a couple cases. In particular when i tried iam with
`user` (instead of `db_user` i got this:
```
cache_key: str = IamHelper.get_credentials_cache_key(info)
File "...python3.8/site-packages/redshift_connector/iam_helper.py", line
330, in get_credentials_cache_key
return ";".join(
TypeError: sequence item 0: expected str instance, NoneType found
```
I think the real problem is that I need to use `db_user` instead of `user`.
Similarly when I omitted `database` I got this error:
```
File "...python3.8/site-packages/redshift_connector/core.py", line 663, in
__init__
raise self.error
redshift_connector.error.ProgrammingError: {'S': 'FATAL', 'C': '3D000', 'M':
'database "IAM:my_admin" does not exist', 'F':
'/home/ec2-user/padb/src/pg/src/backend/utils/init/postinit.c', 'L': '517',
'R': 'LcInitPostgres'}
```
And the problem was that I did not supply `database`. Raising a more
helpful error message in this case would make it more friendly for us to defer
parameter validation to the library :)
Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]