[ https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267954#comment-15267954 ]
Jeremiah Lowin commented on AIRFLOW-31: --------------------------------------- Here's a more complete description of the behavior, taken from the PR. We had a larger discussion about this at the time but it was immediately before the Apache move picked up and I think it got lost in the shuffle (well, I certainly forgot about it, at least!). ------ This PR replaces Airflow's import machinery with standard Python imports. This has an impact in four places: airflow.hooks.* airflow.operators.* airflow.contrib.hooks.* airflow.contrib.operators.* Operators that could formerly be accessed from these directories must now be accessed explicitly from submodules: {code} from airflow.operators import PigOperator # NO from airflow.operators.pig_operator import PigOperator # YES {code} Old style imports will still work but will warn: {code} In [4]: airflow.operators.PigOperator /Users/jlowin/anaconda3/bin/ipython:1: DeprecationWarning: PigOperator: Importing PigOperator directly from 'airflow.operators' has been deprecated. Please import from 'airflow.operators.[operator_module]' instead. Support for direct imports will be dropped entirely in Airflow 2.0. #!/bin/bash /Users/jlowin/anaconda3/bin/python.app Out[4]: pig_operator.PigOperator {code} run_unit_tests.sh exports an environment variable AIRFLOW_USE_NEW_IMPORTS which turns off the old-style (deprecated) imports completely. Therefore, unit tests must use the new style. I've tried to adjust the tests already but there are some I can't test locally so I will see what Travis has issues with and make adjustments. Plugins are handled similarly. Let's say the user creates a plugin MyPlugin with some operators. Currently those operators are available at airflow.operators.\*. Through this PR, that behavior is maintained (through Airflow 2.0) but a new module is dynamically created with the operators at airflow.operators.myplugin.\* Lastly, I took advantage of the fact I was going into lots of files to add licenses to all hooks and operators. > Use standard imports for hooks/operators > ---------------------------------------- > > Key: AIRFLOW-31 > URL: https://issues.apache.org/jira/browse/AIRFLOW-31 > Project: Apache Airflow > Issue Type: Improvement > Affects Versions: Airflow 2.0 > Reporter: Jeremiah Lowin > Assignee: Jeremiah Lowin > Labels: enhancement > > (Migrated from https://github.com/airbnb/airflow/issues/1238) > Currently, Airflow uses a relatively complex import mechanism to import hooks > and operators without polluting the namespace with submodules. I would like > to propose that Airflow abandon that system and use standard Python importing. > Here are a few major reasons why I think the current system has run its > course. > h3. Polluting namespace > The biggest advantage of the current system, as I understand it, is that only > Operators appear in the `airflow.operators` namespace. The submodules that > actually contain the operators do not. > So for example while `airflow.operators.python_operator.PythonOperator` is a > thing, `PythonOperator` is in the `airflow.operators` namespace but > `python_operator` is not. > I think this sort of namespace pollution was helpful when Airflow was a > smaller project, but as the number of hooks/operators grows -- and especially > as the `contrib` hooks/operators grow -- I'd argue that namespacing is a > *good thing*. It provides structure and organization, and opportunities for > documentation (through module docstrings). > In fact, I'd argue that the current namespace is itself getting quite > polluted -- the only way to know what's available is to use something like > Ipython tab-completion to browse an alphabetical list of Operator names, or > to load the source file and grok the import definition (which no one > installing from pypi is likely to do). > h3. Conditional imports > There's a second advantage to the current system that any module that fails > to import is silently ignored. It makes it easy to have optional > dependencies. For example, if someone doesn't have `boto` installed, then > they don't have an `S3Hook` either. Same for a HiveOperator > Again, as Airflow grows and matures, I think this is a little too magic. If > my environment is missing a dependency, I want to hear about it. > On the other hand, the `contrib` namespace sort of depends on this -- we > don't want users to have to install every single dependency. So I propose > that contrib modules all live in their submodules: `from > airflow.contrib.operators.my_operator import MyOperator`. As mentioned > previously, having structure and namespacing is a good thing as the project > gets more complex. > Other ways to handle this include putting "non-standard" dependencies inside > the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so > it can be imported but not used. Another is judicious use of `try`/`except > ImportError`. The simplest is to make people import things explicitly from > submodules. > h3. Operator dependencies > Right now, operators can't depend on each other if they aren't in the same > file. This is for the simple reason that there is no guarantee on what order > the operators will be loaded. It all comes down to which dictionary key gets > loaded first. One day Operator B could be loaded after Operator A; the next > day it might be loaded before. Consequently, A and B can't depend on each > other. Worse, if a user makes two operators that do depend on each other, > they won't get an error message when one fails to import. > For contrib modules in particular, this is sort of killer. > h3. Ease of use > It's *hard* to set up imports for a new operator. The dictionary-based import > instructions aren't obvious for new users, and errors are silently dismissed > which makes debugging difficult. > h3. Identity > Surprisingly, `airflow.operators.SubDagOperator != > airflow.operators.subdag_operator.SubDagOperator`. See #1168. > h2. Proposal > Use standard python importing for hooks/operators/etc. > - `__init__.py` files use straightforward, standard Python imports > - major operators are available at `airflow.operators.OperatorName` or > `airflow.operators.operator_module.OperatorName`. > - contrib operators are only available at > `airflow.contrib.operators.operator_module.OperatorName` in order to manage > dependencies > - operator authors are encouraged to use `__all__` to define their module's > exports > Possibly delete namespace afterward > - in `operators/__init__.py`, run a function at the end of the file which > deletes all modules from the namespace, leaving only `Operators`. This keeps > the namespace clear but lets people use familiar import mechanisms. > Possibly use an import function to handle `ImportError` gracefully > - rewrite `import_module_attrs` to take one module name at a time instead of > a dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)