taranlu-houzz opened a new issue, #40760: URL: https://github.com/apache/airflow/issues/40760
### Apache Airflow version 2.9.2 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? It seems like Airflow is trying to run the full dag script inside of the virtualenv, which does not have `airflow` or `pendulum` installed inside of it. ### What you think should happen instead? Based on the documentation, and other examples I have found online, I would expect it to only run the code inside of the decorated function within the virtualenv. ### How to reproduce - Running Airflow via docker compose as described in this tutorial: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#running-airflow - Using a custom `Dockerfile` built on top of `apache/airflow:2.9.2` - Add various Blender `bpy` package runtime requirements - Install `pdm` and use `pdm` to install Python3.11 - Create a new virtualenv using the `pdm` installed Python3.11 - Add `bpy` module to the virtualenv - Apply a hacky fix to provide dummy cpu mhz data to `/proc/cpuinfo` which is apparently a longstanding issues with docker/qemu on macOS arm: https://gitlab.com/qemu-project/qemu/-/issues/750 - The custom image was built using `--platform linux/amd64` because there are arch mismatch issues that prevent `bpy` from working on docker on macos when using arm - Write a simple test dag that uses the `@task.external_python()` decorator to run using the virtualenv in the image - The dag imports without issues in the Airflow web gui - When run, import errors are raised due to it trying to import from `airflow` and `pendulum`, which are not installed in the virtualenv - Connecting directly to the worker container and running Python in the virtualenv confirms that `bpy` is working properly ### Operating System macOS: 13.6.7 (22G720) ### Versions of Apache Airflow Providers _No response_ ### Deployment Docker-Compose ### Deployment details Customized `Dockerfile`: <details> <summary>Click to expand!</summary> ```dockerfile FROM apache/airflow:2.9.2 ENV PATH="/root/.local/bin:${PATH}" ENV TZ="America/Los_Angeles" ARG DEBIAN_FRONTEND="noninteractive" ARG HZ_WORKDIR="/home/airflow" WORKDIR ${HZ_WORKDIR} # ------------------------------------------------------------------------------------------------ # # NOTE: Need to use root due to how the airflow base is configured. USER root RUN apt update # Install base deps # NOTE: The `build-essential` lib has some .so that are needed by `bpy`. RUN apt install -y \ build-essential \ neovim # Install Python and pipx (default system python version: 3.10.6) RUN apt install -y \ pipx \ python3-venv # Install Blender runtime dependencies RUN apt install -y \ libegl1 \ libgl1-mesa-glx \ libsm6 \ libxfixes3 \ libxi-dev \ libxkbcommon0 \ libxrender1 \ libxxf86vm-dev USER airflow # ------------------------------------------------------------------------------------------------ # # NOTE: Each version of `bpy` supports a specific version of Python. ENV HZ_BPY_VERSION=4.1.0 ENV HZ_PYTHON_VERSION=3.11 # Install pipx, pdm, and the Blender compatible version of Python RUN pipx install pdm RUN pdm python install cpython@${HZ_PYTHON_VERSION} # Create Blender venv ENV HZ_VENV_PATH="${HZ_WORKDIR}/blender_venv" ENV HZ_VENV_PYTHON_PATH="${HZ_VENV_PATH}/bin/python" RUN \ python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" && \ "${python_path}" -m venv "${HZ_VENV_PATH}" RUN \ python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" && \ "${HZ_VENV_PYTHON_PATH}" -m pip install --upgrade pip setuptools RUN \ python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" && \ "${HZ_VENV_PYTHON_PATH}" -m pip install bpy==${HZ_BPY_VERSION} # ------------------------------------------------------------------------------------------------ # # Silly macOS Docker workaround for incorrect /proc/cpuinfo COPY ./fakefopen.c ${HZ_WORKDIR}/ RUN cat /proc/cpuinfo >> fake_cpuinfo RUN echo "cpu MHz : 2345.678" >> fake_cpuinfo RUN gcc -Wall -fPIC -shared -o fakefopen.so fakefopen.c -ldl ENV LD_PRELOAD=${HZ_WORKDIR}/fakefopen.so ``` </details> The `fakefopen.c` workaround (wouldn't need on an actual deployment): <details> <summary>Click to expand!</summary> ```c #define _GNU_SOURCE #define FAKE "/home/airflow/fake_cpuinfo" #include <stdio.h> #include <dlfcn.h> #include <string.h> FILE *fopen(const char *path, const char *mode) { FILE *(*original_fopen)(const char*, const char*); original_fopen = dlsym(RTLD_NEXT, "fopen"); if(strcmp(path, "/proc/cpuinfo") == 0) { return (*original_fopen)(FAKE, mode); } else { return (*original_fopen)(path, mode); } } ``` </details> The test dag: <details> <summary>Click to expand!</summary> ```python import os import pendulum from airflow.decorators import ( dag, task, ) HZ_VENV_PYTHON_PATH: str = os.environ.get("HZ_VENV_PYTHON_PATH") @dag( schedule=None, start_date=pendulum.today("UTC"), # start_date=pendulum.datetime(2021, 1, 1, tz="UTC"), catchup=False, tags=["blender", "test"], ) def blender_test(): """A basic test to use Blender via a virtualenv with bpy.""" @task.external_python( task_id="create_blend_file", python=HZ_VENV_PYTHON_PATH, ) def create_blend_file() -> str: """Create and save a simple blend file.""" import bpy out_file_path = "/tmp/monkey.blend" bpy.ops.mesh.primitive_monkey_add() bpy.ops.wm.save_as_mainfile(filepath=out_file_path) return out_file_path @task.external_python( task_id="read_blend_file_and_render", python=HZ_VENV_PYTHON_PATH, ) def read_blend_file_and_render(blend_file_path: str) -> str: """Read the blend file and render it.""" import bpy bpy.ops.wm.open_mainfile(filepath=blend_file_path) bpy.context.scene.render.image_settings.file_format = "PNG" output_file_path = "/tmp/monkey.png" bpy.context.scene.render.filepath = output_file_path bpy.ops.render.render(write_still=True) return output_file_path @task.bash def rename_render(render_file_path: str) -> None: """Use bash to rename the rendered png file.""" return f"mv {render_file_path} /tmp/monkey_renamed.png" blend_file_path = create_blend_file() render_file_path = read_blend_file_and_render(blend_file_path) rename_render(render_file_path) blender_test() ``` </details> The is the log from the Airflow worker container that shows the error: <details> <summary>Click to expand!</summary> ``` BACKEND=redis DB_HOST=redis DB_PORT=6379 [2024-07-12T14:19:48.342-0700] {configuration.py:2087} INFO - Creating new FAB webserver config file in: /opt/airflow/webserver_config.py -------------- celery@fdf54261b43e v5.4.0 (opalescent) --- ***** ----- -- ******* ---- Linux-6.6.31-linuxkit-x86_64-with-glibc2.36 2024-07-12 14:19:58 - *** --- * --- - ** ---------- [config] - ** ---------- .> app: airflow.providers.celery.executors.celery_executor:0x2aaab7d9d490 - ** ---------- .> transport: redis://redis:6379/0 - ** ---------- .> results: postgresql://airflow:**@postgres/airflow - *** --- * --- .> concurrency: 16 (prefork) -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker) --- ***** ----- -------------- [queues] .> default exchange=default(direct) key=default [tasks] . airflow.providers.celery.executors.celery_executor_utils.execute_command [2024-07-12 14:19:53 -0700] [72] [INFO] Starting gunicorn 22.0.0 [2024-07-12 14:19:53 -0700] [72] [INFO] Listening at: http://[::]:8793 (72) [2024-07-12 14:19:53 -0700] [72] [INFO] Using worker: sync [2024-07-12 14:19:53 -0700] [74] [INFO] Booting worker with pid: 74 [2024-07-12 14:19:53 -0700] [76] [INFO] Booting worker with pid: 76 [2024-07-12 14:20:02,573: WARNING/MainProcess] /home/airflow/.local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine whether broker connection retries are made during startup in Celery 6.0 and above. If you wish to retain the existing behavior for retrying connections on startup, you should set broker_connection_retry_on_startup to True. warnings.warn( [2024-07-12 14:20:02,642: INFO/MainProcess] Connected to redis://redis:6379/0 [2024-07-12 14:20:02,648: WARNING/MainProcess] /home/airflow/.local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine whether broker connection retries are made during startup in Celery 6.0 and above. If you wish to retain the existing behavior for retrying connections on startup, you should set broker_connection_retry_on_startup to True. warnings.warn( [2024-07-12 14:20:02,657: INFO/MainProcess] mingle: searching for neighbors [2024-07-12 14:20:03,697: INFO/MainProcess] mingle: all alone [2024-07-12 14:20:03,787: INFO/MainProcess] celery@fdf54261b43e ready. [2024-07-12 14:23:34,221: INFO/MainProcess] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[6ec4e79c-3488-4a10-b99f-1c2b47bcbb35] received [2024-07-12 14:23:34,506: INFO/ForkPoolWorker-15] [6ec4e79c-3488-4a10-b99f-1c2b47bcbb35] Executing command in Celery: ['airflow', 'tasks', 'run', 'blender_test', 'create_blend_file', 'manual__2024-07-12T21:23:31.163038+00:00', '--local', '--subdir', 'DAGS_FOLDER/blender_test.py'] [2024-07-12 14:23:35,814: INFO/ForkPoolWorker-15] Filling up the DagBag from /opt/airflow/dags/blender_test.py [2024-07-12 14:23:50,109: INFO/ForkPoolWorker-15] Running <TaskInstance: blender_test.create_blend_file manual__2024-07-12T21:23:31.163038+00:00 [queued]> on host fdf54261b43e Traceback (most recent call last): File "/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py", line 563, in from_name return next(cls.discover(name=name)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ StopIteration During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 6, in <module> File "/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py", line 1009, in version return distribution(distribution_name).version ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py", line 982, in distribution return Distribution.from_name(distribution_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py", line 565, in from_name raise PackageNotFoundError(name) importlib.metadata.PackageNotFoundError: No package metadata was found for apache-airflow Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'pendulum' [2024-07-12 14:23:52,860: INFO/ForkPoolWorker-15] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[6ec4e79c-3488-4a10-b99f-1c2b47bcbb35] succeeded in 18.6186027990002s: None ``` </details> ### Anything else? I am new to Airflow and am exploring it as an option for integration with our 3D pipeline. I imagine that I am just doing something incorrectly, but I haven't been able to figure out what is wrong, and as far as I can tell, I am doing thing pretty much the same way I have seen in other examples. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
