Daniel Huang created AIRFLOW-770:
------------------------------------
Summary: HDFS hook should support alternative ways of getting
connection
Key: AIRFLOW-770
URL: https://issues.apache.org/jira/browse/AIRFLOW-770
Project: Apache Airflow
Issue Type: Improvement
Components: hooks
Reporter: Daniel Huang
Priority: Minor
The HDFS hook currently uses {{get_connections()}} instead of
{{get_connection()}} to grab the connection info. I believe this is so if
multiple connections are specified, instead of choosing them at random, it
appropriately passes them all via snakebite's HAClient.
As far as I can tell, this means connection info can't be set outside of the
UI, since environment variables are not looked at (which had me confused for a
bit). I think ideally we'd want to be able to do so for the three different
snakebite clients. Here are some possible suggestions for allowing this:
* AutoConfigClient: add attribute like {{HDFSHook(...,
autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas?
Not very adhering to standard and if we did this, we'd probably want to support
this across all hooks.
references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)