[
https://issues.apache.org/jira/browse/AIRFLOW-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310455#comment-15310455
]
Brian Candler commented on AIRFLOW-200:
---------------------------------------
(1) It appears to be intentional that the CI uses a different set of
requirements to the actual python packaging:
scripts/ci/requirements.txt => has a simple, direct dependency on unicodecsv
setup.py => has no direct dependency on unicodecsv, but has this instead:
hive = [
'hive-thrift-py>=0.0.1',
'pyhive>=0.1.3',
'impyla>=0.13.3',
'unicodecsv>=0.14.1'
]
extras_require={
...
'hive': hive,
So this is probably why the CI didn't catch it. Unfortunately I don't know
enough about python packaging and requirements declarations to fix it.
(2) The magic in `airflow/operators/__init__.py` imports from random locations
into the airflow.operators namespace. Arguably this is broken by design. As far
as I can see, it means that every airflow application will import *all*
possible operators, even the ones it isn't using. This will give a slower
startup time and a larger memory footprint than necessary.
The fact that it also traps ImportErrors is on top of this. It can't tell the
difference between import A failing because A doesn't exist, versus import A
failing because A tries and fails to import library B (e.g. unicodecsv in this
case), and it treats all these errors as normal. This is presumably so that if
you are missing some optional dependency, airflow can continue.
However the result is, a name that you were expecting to exist in
airflow.operators (such as airflow.operators.HiveOperator), appears simply not
to be there. The information about why it failed to load is already lost.
> Hiding import errors / missing dependency on unicodecsv
> -------------------------------------------------------
>
> Key: AIRFLOW-200
> URL: https://issues.apache.org/jira/browse/AIRFLOW-200
> Project: Apache Airflow
> Issue Type: Bug
> Components: core
> Affects Versions: Airflow 1.7.1.2
> Environment: ubuntu 14.04 (python 2.7), new virtualenv, pip install
> airflow
> Reporter: Brian Candler
> Priority: Minor
> Labels: newbie
>
> When running the quickstart instructions at
> http://pythonhosted.org/airflow/start.html inside a clean virtualenv:
> ERROR [airflow.models.DagBag] Failed to import:
> /home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/example_twitter_dag.py
> Traceback (most recent call last):
> File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/models.py",
> line 247, in process_file
> m = imp.load_source(mod_name, filepath)
> File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/example_twitter_dag.py",
> line 26, in <module>
> from airflow.operators import BashOperator, HiveOperator,
> PythonOperator
> ImportError: cannot import name HiveOperator
> Unfortunately that message doesn't help diagnose the problem, which is being
> hidden by auto-import magic.
> It requires manually probing imports from the true source modules:
> >>> from airflow.operators.hive_operator import HiveOperator
> ...
> ImportError: cannot import name HiveCliHook
> >>> from airflow.hooks.hive_hooks import HiveCliHook
> ...
> ImportError: No module named unicodecsv
> Aha. "pip install unicodecsv" fixes the error.
> So I'd suggest two issues:
> 1. Add a packaging dependency on unicodecsv to fix this particular problem
> 2. Fix the auto-import magic so that it doesn't suppress these errors
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)