dwreeves opened a new issue, #24572:
URL: https://github.com/apache/airflow/issues/24572

   ### What do you see as an issue?
   
   Relevant page: 
https://airflow.apache.org/docs/apache-airflow-providers-snowflake/stable/connections/snowflake.html
   
   ## Behavior in the Airflow package
   
   The `SnowflakeHook` object in Airflow behaves oddly compared to some other 
database hooks like Postgres (so extra clarity in the documentation is 
beneficial).
   
   Most notably, the `SnowflakeHook` does _not_ make use of the either the 
`host` or `port` of the `Connection` object it consumes. It is completely 
pointless to specify these two fields.
   
   When constructing the URL in a runtime context, `snowflake.sqlalchemy.URL` 
is used for parsing. `URL()` allows for either `account` or `host` to be 
specified as kwargs. Either one of these 2 kwargs will correspond with what 
we'd conventionally call the host in a typical URL's anatomy. However, because 
`SnowflakeHook` never parses `host`, any `host` defined in the Connection 
object would never get this far into the parsing.
   
   ## Issue with the documentation
   
   Right now the documentation does not make clear that it is completely 
pointless to specify the `host`. The documentation correctly omits the port, 
but says that the host is optional. It does not warn the user about this field 
never being consumed at all by the `SnowflakeHook` ([source 
here](https://github.com/apache/airflow/blob/main/airflow/providers/snowflake/hooks/snowflake.py)).
   
   This can lead to some confusion especially because the Snowflake URI 
consumed by `SQLAlchemy` (which many people using Snowflake will be familiar 
with) uses either the "account" or "host" as its host. So a user coming from 
SQLAlchemy may think it is fine to post the account as the "host" and skip 
filling in the "account" inside the extras (after all, it's "extra"), whereas 
that doesn't work.
   
   I would argue that if it is correct to omit the `port` in the documentation 
(which it is), then `host` should also be excluded.
   
   Furthermore, the documentation reinforces this confusion with the last few 
lines, where an environment variable example connection is defined that uses a 
host.
   
   Finally, the documentation says "When specifying the connection in 
environment variable you should specify it using URI syntax", which is no 
longer true as of 2.3.0.
   
   
   ### Solving the problem
   
   I have 3 proposals for how the documentation should be updated to better 
reflect how the `SnowflakeHook` actually works.
   
   1. The `Host` option should not be listed as part of the "Configuring the 
Connection" section.
   
   2. The example URI should remove the host. The new example URI would look 
like this: 
`snowflake://user:password@/db-schema?account=account&database=snow-db&region=us-east&warehouse=snow-warehouse`.
 This URI with a blank host works fine; you can test this yourself:
   
      ```python
      from airflow.models.connection import Connection
      
      c = Connection(conn_id="foo", 
uri="snowflake://user:password@/db-schema?account=account&database=snow-db&region=us-east&warehouse=snow-warehouse")
      print(c.host)
      print(c.extra_dejson)
      ```
   
   3. An example should be provided of a valid Snowflake construction using the 
JSON. This example would not only work on its own merits of defining an 
environment variable connection valid for 2.3.0, but it also would highlight 
some of the idiosyncrasies of how Airflow defines connections to Snowflake. 
This would also be valuable as a reference for the AWS `SecretsManagerBackend` 
for when `full_url_mode` is set to `False`.
   
   ### Anything else
   
   I wasn't sure whether to label this issue as a provider issue or 
documentation issue; I saw templates for either but not both.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to