argemiront opened a new issue #15495:
URL: https://github.com/apache/airflow/issues/15495
**Apache Airflow version**:
2.0.2
**Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
**Environment**:
- **Cloud provider or hardware configuration**:
- **OS** (e.g. from /etc/os-release): official airflow images
_apache/airflow:2.0.2-python3.8_ and apache/airflow:2.0.2-python3.6
- **Kernel** (e.g. `uname -a`):
- **Install tools**: docker
- **Others**:
**What happened**:
Using the official 2.0.2 airflow image the 3rd-party packages are not being
recognized by the DAG parser, firing the `ModuleNotFoundError` on DAGs and
plugins.
As a test, I created a new docker image with one 3rd-party package
(`surveymonty`). A simple DAG with no tasks imports it. The following error is
observed:
```
Broken DAG: [/usr/src/airflow/dags/simpledag.py] Traceback (most recent call
last):
File "<frozen importlib._bootstrap>", line 219, in
_call_with_frames_removed
File "/usr/src/airflow/dags/simpledag.py", line 2, in <module>
import surveymonty
ModuleNotFoundError: No module named 'surveymonty'
```
I could confirm the package is installed by running `pip freeze` and `python
simpledag.py` with no errors on both cases.
**What you expected to happen**:
3rd-party packages to be recognized by the parser.
Using a python image and manually installing airflow 2.0.2 and the
requirements, the issue is not present.
**How to reproduce it**:
Create the following Dockerfile:
```
FROM apache/airflow:2.0.2-python3.8
USER root
RUN pip3 install surveymonty==0.2.5
WORKDIR /usr/src/airflow
ENV AIRFLOW_HOME /usr/src/airflow
ENV AIRFLOW_GPL_UNIDECODE true
RUN chown airflow:airflow .
USER airflow
COPY ./simpledag.py dags/simpledag.py
# Airflow webserver
EXPOSE 8080
ENTRYPOINT []
```
simpledag.py:
```
from airflow import DAG
import surveymonty
dag = DAG('test', schedule_interval="* * * * *")
```
docker-compose.yml:
```
version: '3.8'
services:
db:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
build: .
restart: always
depends_on:
- db
environment:
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__DAGS_FOLDER=/usr/src/airflow/dags
- AIRFLOW__CORE__PLUGINS_FOLDER=/usr/src/airflow/plugins
-
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@db:5432/airflow
ports:
- "8080:8080"
command: bash -c "airflow initdb && airflow webserver"
scheduler:
build: .
restart: always
depends_on:
- db
environment:
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__DAGS_FOLDER=/usr/src/airflow/dags
- AIRFLOW__CORE__PLUGINS_FOLDER=/usr/src/airflow/plugins
-
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@db:5432/airflow
command: airflow scheduler
```
I'm my _production_ case, all 3rd-party libraries imported on plugins and
DAGs fire the `ModuleNotFoundError` and the latest Google provider has multiple
failures as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]