taranlu-houzz opened a new issue, #40760:
URL: https://github.com/apache/airflow/issues/40760

   ### Apache Airflow version
   
   2.9.2
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   It seems like Airflow is trying to run the full dag script inside of the 
virtualenv, which does not have `airflow` or `pendulum` installed inside of it.
   
   ### What you think should happen instead?
   
   Based on the documentation, and other examples I have found online, I would 
expect it to only run the code inside of the decorated function within the 
virtualenv.
   
   ### How to reproduce
   
   - Running Airflow via docker compose as described in this tutorial: 
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#running-airflow
   - Using a custom `Dockerfile` built on top of `apache/airflow:2.9.2`
     - Add various Blender `bpy` package runtime requirements
     - Install `pdm` and use `pdm` to install Python3.11
     - Create a new virtualenv using the `pdm` installed Python3.11
     - Add `bpy` module to the virtualenv
     - Apply a hacky fix to provide dummy cpu mhz data to `/proc/cpuinfo` which 
is apparently a longstanding issues with docker/qemu on macOS arm: 
https://gitlab.com/qemu-project/qemu/-/issues/750
   - The custom image was built using `--platform linux/amd64` because there 
are arch mismatch issues that prevent `bpy` from working on docker on macos 
when using arm
   - Write a simple test dag that uses the `@task.external_python()` decorator 
to run using the virtualenv in the image
   - The dag imports without issues in the Airflow web gui
   - When run, import errors are raised due to it trying to import from 
`airflow` and `pendulum`, which are not installed in the virtualenv
   - Connecting directly to the worker container and running Python in the 
virtualenv confirms that `bpy` is working properly
   
   ### Operating System
   
   macOS: 13.6.7 (22G720)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Customized `Dockerfile`:
   <details>
   <summary>Click to expand!</summary>
   
   ```dockerfile
   FROM apache/airflow:2.9.2
   
   ENV PATH="/root/.local/bin:${PATH}"
   ENV TZ="America/Los_Angeles"
   
   ARG DEBIAN_FRONTEND="noninteractive"
   ARG HZ_WORKDIR="/home/airflow"
   
   WORKDIR ${HZ_WORKDIR}
   
   
   # 
------------------------------------------------------------------------------------------------
 #
   
   
   # NOTE: Need to use root due to how the airflow base is configured.
   USER root
   
   RUN apt update
   
   # Install base deps
   # NOTE: The `build-essential` lib has some .so that are needed by `bpy`.
   RUN apt install -y \
     build-essential \
     neovim
   
   # Install Python and pipx (default system python version: 3.10.6)
   RUN apt install -y \
     pipx \
     python3-venv
   
   # Install Blender runtime dependencies
   RUN apt install -y \
     libegl1 \
     libgl1-mesa-glx \
     libsm6 \
     libxfixes3 \
     libxi-dev \
     libxkbcommon0 \
     libxrender1 \
     libxxf86vm-dev
   
   USER airflow 
   
   
   # 
------------------------------------------------------------------------------------------------
 #
   
   
   # NOTE: Each version of `bpy` supports a specific version of Python.
   ENV HZ_BPY_VERSION=4.1.0
   ENV HZ_PYTHON_VERSION=3.11
   
   # Install pipx, pdm, and the Blender compatible version of Python
   RUN pipx install pdm
   RUN pdm python install cpython@${HZ_PYTHON_VERSION}
   
   # Create Blender venv
   ENV HZ_VENV_PATH="${HZ_WORKDIR}/blender_venv"
   ENV HZ_VENV_PYTHON_PATH="${HZ_VENV_PATH}/bin/python"
   RUN \
     python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" 
&& \
     "${python_path}" -m venv "${HZ_VENV_PATH}"
   
   RUN \
     python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" 
&& \
     "${HZ_VENV_PYTHON_PATH}" -m pip install --upgrade pip setuptools
   
   RUN \
     python_path="$(pdm python list | sed -n 's/.*(\(.*\))/\1/p' | head -n 1)" 
&& \
     "${HZ_VENV_PYTHON_PATH}" -m pip install bpy==${HZ_BPY_VERSION}
   
   
   # 
------------------------------------------------------------------------------------------------
 #
   
   
   # Silly macOS Docker workaround for incorrect /proc/cpuinfo
   COPY ./fakefopen.c ${HZ_WORKDIR}/
   RUN cat /proc/cpuinfo >> fake_cpuinfo
   RUN echo "cpu MHz   : 2345.678" >> fake_cpuinfo
   RUN gcc -Wall -fPIC -shared -o fakefopen.so fakefopen.c -ldl
   ENV LD_PRELOAD=${HZ_WORKDIR}/fakefopen.so
   ```
   
   </details>
   
   The `fakefopen.c` workaround (wouldn't need on an actual deployment):
   
   <details>
   <summary>Click to expand!</summary>
   
   ```c
   #define _GNU_SOURCE
   #define FAKE "/home/airflow/fake_cpuinfo"
   #include <stdio.h>
   #include <dlfcn.h>
   #include <string.h>
   
   
   FILE *fopen(const char *path, const char *mode) {
        FILE *(*original_fopen)(const char*, const char*);
        original_fopen = dlsym(RTLD_NEXT, "fopen");
        if(strcmp(path, "/proc/cpuinfo") == 0) {
                return (*original_fopen)(FAKE, mode);
        } else {
                return (*original_fopen)(path, mode);
        }
   }
   ```
   
   </details>
   
   The test dag:
   
   <details>
   <summary>Click to expand!</summary>
   
   ```python
   import os
   
   import pendulum
   
   from airflow.decorators import (
       dag,
       task,
   )
   
   
   HZ_VENV_PYTHON_PATH: str = os.environ.get("HZ_VENV_PYTHON_PATH")
   
   
   @dag(
       schedule=None,
       start_date=pendulum.today("UTC"),
       # start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       catchup=False,
       tags=["blender", "test"],
   )
   def blender_test():
       """A basic test to use Blender via a virtualenv with bpy."""
   
       @task.external_python(
           task_id="create_blend_file",
           python=HZ_VENV_PYTHON_PATH,
       )
       def create_blend_file() -> str:
           """Create and save a simple blend file."""
   
           import bpy
   
           out_file_path = "/tmp/monkey.blend"
   
           bpy.ops.mesh.primitive_monkey_add()
           bpy.ops.wm.save_as_mainfile(filepath=out_file_path)
   
           return out_file_path
   
       @task.external_python(
           task_id="read_blend_file_and_render",
           python=HZ_VENV_PYTHON_PATH,
       )
       def read_blend_file_and_render(blend_file_path: str) -> str:
           """Read the blend file and render it."""
   
           import bpy
   
           bpy.ops.wm.open_mainfile(filepath=blend_file_path)
           bpy.context.scene.render.image_settings.file_format = "PNG"
   
           output_file_path = "/tmp/monkey.png"
           bpy.context.scene.render.filepath = output_file_path
   
           bpy.ops.render.render(write_still=True)
   
           return output_file_path
   
       @task.bash
       def rename_render(render_file_path: str) -> None:
           """Use bash to rename the rendered png file."""
   
           return f"mv {render_file_path} /tmp/monkey_renamed.png"
   
       blend_file_path = create_blend_file()
       render_file_path = read_blend_file_and_render(blend_file_path)
       rename_render(render_file_path)
   
   
   blender_test()
   ```
   
   </details>
   
   The is the log from the Airflow worker container that shows the error:
   
   <details>
   <summary>Click to expand!</summary>
   
   ```
   
   BACKEND=redis
   DB_HOST=redis
   DB_PORT=6379
   
   [2024-07-12T14:19:48.342-0700] {configuration.py:2087} INFO - Creating new 
FAB webserver config file in: /opt/airflow/webserver_config.py
    
    -------------- celery@fdf54261b43e v5.4.0 (opalescent)
   --- ***** ----- 
   -- ******* ---- Linux-6.6.31-linuxkit-x86_64-with-glibc2.36 2024-07-12 
14:19:58
   - *** --- * --- 
   - ** ---------- [config]
   - ** ---------- .> app:         
airflow.providers.celery.executors.celery_executor:0x2aaab7d9d490
   - ** ---------- .> transport:   redis://redis:6379/0
   - ** ---------- .> results:     postgresql://airflow:**@postgres/airflow
   - *** --- * --- .> concurrency: 16 (prefork)
   -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this 
worker)
   --- ***** ----- 
    -------------- [queues]
                   .> default          exchange=default(direct) key=default
                   
   
   [tasks]
     . airflow.providers.celery.executors.celery_executor_utils.execute_command
   
   [2024-07-12 14:19:53 -0700] [72] [INFO] Starting gunicorn 22.0.0
   [2024-07-12 14:19:53 -0700] [72] [INFO] Listening at: http://[::]:8793 (72)
   [2024-07-12 14:19:53 -0700] [72] [INFO] Using worker: sync
   [2024-07-12 14:19:53 -0700] [74] [INFO] Booting worker with pid: 74
   [2024-07-12 14:19:53 -0700] [76] [INFO] Booting worker with pid: 76
   [2024-07-12 14:20:02,573: WARNING/MainProcess] 
/home/airflow/.local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508:
 CPendingDeprecationWarning: The broker_connection_retry configuration setting 
will no longer determine
   whether broker connection retries are made during startup in Celery 6.0 and 
above.
   If you wish to retain the existing behavior for retrying connections on 
startup,
   you should set broker_connection_retry_on_startup to True.
     warnings.warn(
   
   [2024-07-12 14:20:02,642: INFO/MainProcess] Connected to redis://redis:6379/0
   [2024-07-12 14:20:02,648: WARNING/MainProcess] 
/home/airflow/.local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508:
 CPendingDeprecationWarning: The broker_connection_retry configuration setting 
will no longer determine
   whether broker connection retries are made during startup in Celery 6.0 and 
above.
   If you wish to retain the existing behavior for retrying connections on 
startup,
   you should set broker_connection_retry_on_startup to True.
     warnings.warn(
   
   [2024-07-12 14:20:02,657: INFO/MainProcess] mingle: searching for neighbors
   [2024-07-12 14:20:03,697: INFO/MainProcess] mingle: all alone
   [2024-07-12 14:20:03,787: INFO/MainProcess] celery@fdf54261b43e ready.
   [2024-07-12 14:23:34,221: INFO/MainProcess] Task 
airflow.providers.celery.executors.celery_executor_utils.execute_command[6ec4e79c-3488-4a10-b99f-1c2b47bcbb35]
 received
   [2024-07-12 14:23:34,506: INFO/ForkPoolWorker-15] 
[6ec4e79c-3488-4a10-b99f-1c2b47bcbb35] Executing command in Celery: ['airflow', 
'tasks', 'run', 'blender_test', 'create_blend_file', 
'manual__2024-07-12T21:23:31.163038+00:00', '--local', '--subdir', 
'DAGS_FOLDER/blender_test.py']
   [2024-07-12 14:23:35,814: INFO/ForkPoolWorker-15] Filling up the DagBag from 
/opt/airflow/dags/blender_test.py
   [2024-07-12 14:23:50,109: INFO/ForkPoolWorker-15] Running <TaskInstance: 
blender_test.create_blend_file manual__2024-07-12T21:23:31.163038+00:00 
[queued]> on host fdf54261b43e
   Traceback (most recent call last):
     File 
"/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py",
 line 563, in from_name
       return next(cls.discover(name=name))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   StopIteration
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "<string>", line 6, in <module>
     File 
"/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py",
 line 1009, in version
       return distribution(distribution_name).version
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py",
 line 982, in distribution
       return Distribution.from_name(distribution_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/share/pdm/python/[email protected]/lib/python3.11/importlib/metadata/__init__.py",
 line 565, in from_name
       raise PackageNotFoundError(name)
   importlib.metadata.PackageNotFoundError: No package metadata was found for 
apache-airflow
   Traceback (most recent call last):
     File "<string>", line 1, in <module>
   ModuleNotFoundError: No module named 'pendulum'
   [2024-07-12 14:23:52,860: INFO/ForkPoolWorker-15] Task 
airflow.providers.celery.executors.celery_executor_utils.execute_command[6ec4e79c-3488-4a10-b99f-1c2b47bcbb35]
 succeeded in 18.6186027990002s: None
   ```
   
   </details>
   
   ### Anything else?
   
   I am new to Airflow and am exploring it as an option for integration with 
our 3D pipeline. I imagine that I am just doing something incorrectly, but I 
haven't been able to figure out what is wrong, and as far as I can tell, I am 
doing thing pretty much the same way I have seen in other examples.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to