[
https://issues.apache.org/jira/browse/AIRFLOW-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876310#comment-16876310
]
Ash Berlin-Taylor commented on AIRFLOW-4872:
--------------------------------------------
In many (but not all as you've discovered) we don't create the hook until
{{execute()}} time of the DAG - that is probably the pattern we should follow
in the sftp_sensor (and others)
> Airflow hooks implement connection fetching inconsistently
> ----------------------------------------------------------
>
> Key: AIRFLOW-4872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4872
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.10.3
> Reporter: James Meickle
> Priority: Minor
>
> We run a CI suite against our DAGs that initializes them to make sure they're
> valid, but doesn't actually run them. Because the DAG doesn't run in CI, we
> don't provide even mock credentials. This bit us during an upgrade from
> 1.10.0 to 1.10.3, when I started seeing this exception in CI:
> {code}
> ERROR [root] Failed to load DAG at
> "/code/nightly_portfolio/nightly_portfolio.py": The conn_id `XXX` isn't
> defined
> Traceback (most recent call last):
> File "./find_dags.py", line 81, in <module>
> dag = m.get_dag(**dag_context)
> File "/code/nightly_portfolio/nightly_portfolio.py", line 527, in get_dag
> >> XXX.move_to_sftp
> File "./shared/resources.py", line 151, in sense_sftp
> sftp_conn_id=self.src_sftp_conn_id,
> File "/usr/local/lib/python3.6/dist-packages/airflow/utils/decorators.py",
> line 98, in wrapper
> result = func(*args, **kwargs)
> File
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/sensors/sftp_sensor.py",
> line 41, in __init__
> self.hook = SFTPHook(sftp_conn_id)
> File
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/hooks/sftp_hook.py",
> line 48, in __init__
> super(SFTPHook, self).__init__(*args, **kwargs)
> File
> "/usr/local/lib/python3.6/dist-packages/airflow/contrib/hooks/ssh_hook.py",
> line 90, in __init__
> conn = self.get_connection(self.ssh_conn_id)
> File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py",
> line 80, in get_connection
> conn = random.choice(cls.get_connections(conn_id))
> File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py",
> line 75, in get_connections
> conns = cls._get_connections_from_db(conn_id)
> File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 73,
> in wrapper
> return func(*args, **kwargs)
> File "/usr/local/lib/python3.6/dist-packages/airflow/hooks/base_hook.py",
> line 58, in _get_connections_from_db
> "The conn_id `{0}` isn't defined".format(conn_id))
> airflow.exceptions.AirflowException: The conn_id `XXX` isn't defined
> {code}
>
> I was able to trace this new failure to this commit:
> [https://github.com/apache/airflow/commit/a0489d09fe4b0d3a2e0893c9cfba0e379e79390b]
> The work to align the SFTP hook with the SSH hook unfortunately also changed
> where it calls {{get_connection}}. Previously it did so in the {{get_conn}}
> method, and now it does so in {{__init_}}_, which _does_ run in our CI tests.
> I have filed this as an "Improvement" rather than a bug because after some
> cursory research it appears that how Airflow handles connection retrieval is
> _inconsistent_. Hooks seem split as to whether to retrieve connection info in
> {{___init___}}, in {{get_conn}}, or in methods that actually use the
> connection.
> My suggestion is that _no_ hooks should retrieve connections in
> {{___init___}}_, and we should classify it as a bug if they do so, going
> forwards. The vast majority of __{{}}_{{_init___}} calls will take place on a
> scheduler or webserver process, representing wasted database calls and log
> lines. Hooks that define a {{get_conn}} method should probably retrieve their
> connection there. Hooks that allow directly calling methods should instead
> define a {{_connection}} property that initializes and persists a connection
> if required, and then returns it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)