[ 
https://issues.apache.org/jira/browse/AIRFLOW-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876310#comment-16876310
 ] 

Ash Berlin-Taylor commented on AIRFLOW-4872:
--------------------------------------------

In many (but not all as you've discovered) we don't create the hook until 
{{execute()}} time of the DAG - that is probably the pattern we should follow 
in the sftp_sensor (and others)


> Airflow hooks implement connection fetching inconsistently
> ----------------------------------------------------------
>
>                 Key: AIRFLOW-4872
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4872
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.3
>            Reporter: James Meickle
>            Priority: Minor
>
> We run a CI suite against our DAGs that initializes them to make sure they're 
> valid, but doesn't actually run them. Because the DAG doesn't run in CI, we 
> don't provide even mock credentials. This bit us during an upgrade from 
> 1.10.0 to 1.10.3, when I started seeing this exception in CI:
> {code}
> ERROR [root] Failed to load DAG at 
> "/code/nightly_portfolio/nightly_portfolio.py": The conn_id `XXX` isn't 
> defined
> Traceback (most recent call last):
>   File "./find_dags.py", line 81, in <module>
>     dag = m.get_dag(**dag_context)
>   File "/code/nightly_portfolio/nightly_portfolio.py", line 527, in get_dag
>     >> XXX.move_to_sftp
>   File "./shared/resources.py", line 151, in sense_sftp
>     sftp_conn_id=self.src_sftp_conn_id,
>   File "/usr/local/lib/python3.6/dist-packages/airflow/utils/decorators.py", 
> line 98, in wrapper
>     result = func(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/sensors/sftp_sensor.py",
>  line 41, in __init__
>     self.hook = SFTPHook(sftp_conn_id)
>   File 
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/hooks/sftp_hook.py", 
> line 48, in __init__
>     super(SFTPHook, self).__init__(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/hooks/ssh_hook.py", 
> line 90, in __init__
>     conn = self.get_connection(self.ssh_conn_id)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py", 
> line 80, in get_connection
>     conn = random.choice(cls.get_connections(conn_id))
>   File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py", 
> line 75, in get_connections
>     conns = cls._get_connections_from_db(conn_id)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 73, 
> in wrapper
>     return func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py", 
> line 58, in _get_connections_from_db
>     "The conn_id `{0}` isn't defined".format(conn_id))
> airflow.exceptions.AirflowException: The conn_id `XXX` isn't defined
> {code}
>  
>  I was able to trace this new failure to this commit: 
> [https://github.com/apache/airflow/commit/a0489d09fe4b0d3a2e0893c9cfba0e379e79390b]
> The work to align the SFTP hook with the SSH hook unfortunately also changed 
> where it calls {{get_connection}}. Previously it did so in the {{get_conn}} 
> method, and now it does so in {{__init_}}_, which _does_ run in our CI tests.
> I have filed this as an "Improvement" rather than a bug because after some 
> cursory research it appears that how Airflow handles connection retrieval is 
> _inconsistent_. Hooks seem split as to whether to retrieve connection info in 
> {{___init___}}, in {{get_conn}}, or in methods that actually use the 
> connection.
> My suggestion is that _no_ hooks should retrieve connections in 
> {{___init___}}_, and we should classify it as a bug if they do so, going 
> forwards. The vast majority of __{{}}_{{_init___}} calls will take place on a 
> scheduler or webserver process, representing wasted database calls and log 
> lines. Hooks that define a {{get_conn}} method should probably retrieve their 
> connection there. Hooks that allow directly calling methods should instead 
> define a {{_connection}} property that initializes and persists a connection 
> if required, and then returns it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to