dstandish commented on a change in pull request #18447:
URL: https://github.com/apache/airflow/pull/18447#discussion_r722690470



##########
File path: docs/apache-airflow-providers-amazon/connections/redshift.rst
##########
@@ -0,0 +1,82 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _howto/connection:redshift:
+
+Amazon Redshift Connection
+==========================
+
+The Redshift connection type enables integrations with Redshift.
+
+Authenticating to Amazon Redshift
+---------------------------------
+
+Authentication may be performed using any of the authentication methods 
supported by `redshift_connector 
<https://github.com/aws/amazon-redshift-python-driver>`_ such as via direct 
credentials, IAM authentication, or using an Identity Provider (IdP) plugin.
+
+Default Connection IDs
+-----------------------
+
+The default connection ID is ``redshift_default``.
+
+Configuring the Connection
+--------------------------
+
+
+User
+  Specify the username to use for authentication with Amazon Redshift.
+
+Password
+  Specify the password to use for authentication with Amazon Redshift.
+
+Host
+  Specify the Amazon Redshift hostname.
+
+Database
+  Specify the Amazon Redshift database name.
+
+Extra
+    Specify the extra parameters (as json dictionary) that can be used in
+    Amazon Redshift connection. For a complete list of supported parameters
+    please see the `documentation 
<https://github.com/aws/amazon-redshift-python-driver#connection-parameters>`_
+    for redshift_connector.
+
+
+When specifying the connection in environment variable you should specify
+it using URI syntax.
+
+Note that all components of the URI should be URL-encoded.
+
+Examples
+--------
+
+Database Authentication
+
+.. code-block:: bash
+
+  
AIRFLOW_CONN_REDSHIFT_DEFAULT=redshift://awsuser:passw...@redshift-cluster-1.123456789.us-west-1.redshift.amazonaws.com:5439/?database=dev&ssl=True
+
+IAM Authentication using AWS Profile
+
+.. code-block:: bash
+
+  
AIRFLOW_CONN_REDSHIFT_DEFAULT=redshift://:@:/?database=dev&iam=True&db_user=awsuser&cluster_identifier=redshift-cluster-1&profile=default

Review comment:
       Ok so dealing with airflow URIs is a bit tricky.
   
   And in this case I think there's a small problem.
   
   Look at the handling of the `iam` parameter:
   
   ```python
   >>> from airflow.models.connection import Connection
   >>> c = 
Connection(uri='redshift://:@:/?database=dev&iam=True&db_user=awsuser&cluster_identifier=redshift-cluster-1&profile=default')
   >>> c.extra_dejson
   {'database': 'dev', 'iam': 'True', 'db_user': 'awsuser', 
'cluster_identifier': 'redshift-cluster-1', 'profile': 'default'}
   ```
   
   And if redshift connector gets `iam='True'` instead of `iam=True` it won't 
work.
   
   While we _could_ implement logic to handle this in the hook, we don't need 
to because there's a way to produce the URI such that we avoid this issue.
   
   We have a `get_uri` method on `Connection` that produces the URI from a 
connection object.  And when doing the standard URI encoding will lose fidelity 
(e.g. bool converted to string on reparsing) then it will use the alternative 
representation of extra:
   ```python
   >>> c = Connection(conn_type='redshift', extra=json.dumps({"database":"dev", 
"iam":True,"db_user":"awsuser", "cluster_identifier":"redshift-cluster-1", 
"profile":"default"}))
   >>> c.get_uri()
   
'redshift://?__extra__=%7B%22database%22%3A+%22dev%22%2C+%22iam%22%3A+true%2C+%22db_user%22%3A+%22awsuser%22%2C+%22cluster_identifier%22%3A+%22redshift-cluster-1%22%2C+%22profile%22%3A+%22default%22%7D'
   ```
   
   It's ugly but it works.
   
   So bottom line I think here it would be to just show examples of how to 
define the connections using `Connection` instead of URI format.  It's the 
easiest way to produce the correct URI.  And you could also show the generated 
URI and how you produced it as I've done above, or perhaps just point to the 
"generating an airflow URI" section in the corehowto / "managing connections" 
doc, where this is covered in detail.
   
   ---
   
   Side note I hope to implement support for json serialization broadly (i.e. 
as an alternative to airflow URI) along the lines done [here with 
SSM](https://github.com/apache/airflow/pull/18692), which will make this a 
little less painful.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to