[ 
https://issues.apache.org/jira/browse/AIRFLOW-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310455#comment-15310455
 ] 

Brian Candler commented on AIRFLOW-200:
---------------------------------------

(1) It appears to be intentional that the CI uses a different set of 
requirements to the actual python packaging:

scripts/ci/requirements.txt => has a simple, direct dependency on unicodecsv
setup.py => has no direct dependency on unicodecsv, but has this instead:

hive = [
    'hive-thrift-py>=0.0.1',
    'pyhive>=0.1.3',
    'impyla>=0.13.3',
    'unicodecsv>=0.14.1'
]

        extras_require={
        ...
            'hive': hive,

So this is probably why the CI didn't catch it. Unfortunately I don't know 
enough about python packaging and requirements declarations to fix it.

(2) The magic in `airflow/operators/__init__.py` imports from random locations 
into the airflow.operators namespace. Arguably this is broken by design. As far 
as I can see, it means that every airflow application will import *all* 
possible operators, even the ones it isn't using. This will give a slower 
startup time and a larger memory footprint than necessary.

The fact that it also traps ImportErrors is on top of this. It can't tell the 
difference between import A failing because A doesn't exist, versus import A 
failing because A tries and fails to import library B (e.g. unicodecsv in this 
case), and it treats all these errors as normal. This is presumably so that if 
you are missing some optional dependency, airflow can continue.

However the result is, a name that you were expecting to exist in 
airflow.operators (such as airflow.operators.HiveOperator), appears simply not 
to be there. The information about why it failed to load is already lost.

> Hiding import errors / missing dependency on unicodecsv
> -------------------------------------------------------
>
>                 Key: AIRFLOW-200
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-200
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core
>    Affects Versions: Airflow 1.7.1.2
>         Environment: ubuntu 14.04 (python 2.7), new virtualenv, pip install 
> airflow
>            Reporter: Brian Candler
>            Priority: Minor
>              Labels: newbie
>
> When running the quickstart instructions at 
> http://pythonhosted.org/airflow/start.html inside a clean virtualenv:
> ERROR [airflow.models.DagBag] Failed to import:
> /home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/example_twitter_dag.py
> Traceback (most recent call last):
>   File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/models.py",
> line 247, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/example_twitter_dag.py",
> line 26, in <module>
>     from airflow.operators import BashOperator, HiveOperator,
> PythonOperator
> ImportError: cannot import name HiveOperator
> Unfortunately that message doesn't help diagnose the problem, which is being 
> hidden by auto-import magic.
> It requires manually probing imports from the true source modules:
> >>> from airflow.operators.hive_operator import HiveOperator
> ...
> ImportError: cannot import name HiveCliHook
> >>> from airflow.hooks.hive_hooks import HiveCliHook
> ...
> ImportError: No module named unicodecsv
> Aha. "pip install unicodecsv" fixes the error.
> So I'd suggest two issues:
> 1. Add a packaging dependency on unicodecsv to fix this particular problem
> 2. Fix the auto-import magic so that it doesn't suppress these errors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to