First of all - please do not use the devlist for troubleshooting.
There are better channels for that, slack, Github Issues, github
discussions. See https://airflow.apache.org/community/.

Secondly - you probably set too high of expectations you have from
Quick Starts. If you want something that works out-of-the box and you
can manage it easily - use Helm Chart and Kubernetes. Also do not
forget this is a free software and basically you get what you pay for.
If you want to have managed service - there are companies that offer
Airflow Managed as a Service. Otherwise you need to setup and
configure it yourself. If you are not capable of solving some basic
(typical) problems like multiple versions of python - I recommend you
to go the route of managed service.

All the options how you can install and manage Airflow and what kind
of expectations are from you as a user can be found here
https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html
and I recommend you to read it before you make next steps.

You likely missed the warning in
https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html
- Docker Compose is just a quick start. And if you mess with your
installation, you need to fix it on your own. Quick Start is for a
playground for developers and people who want to play with it and
anything you do there is not meant to be persistent. If you made some
mistakes and mixed different versions of Python you should fix it on
your own. I recommend you to wipe everything out and start from
scratch and make sure you experiment in separate virtualenv/users if
you are unsure what you do (which is pretty normal when you experiment
first)

The warning is here:
https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#production-readiness

DO NOT expect the Docker Compose below will be enough to run
production-ready Docker Compose Airflow installation using it. This is
truly quick-start docker-compose for you to get Airflow up and running
locally and get your hands dirty with Airflow. Configuring a
Docker-Compose installation that is ready for production requires an
intrinsic knowledge of Docker Compose, a lot of customization and
possibly even writing the Docker Compose file that will suit your
needs from the scratch. It’s probably OK if you want to run Docker
Compose-based deployment, but short of becoming a Docker Compose
expert, it’s highly unlikely you will get robust deployment with it

If you want to get an easy to configure Docker-based deployment that
Airflow Community develops, supports and can provide support with
deployment, you should consider using Kubernetes and deploying Airflow
using Official Airflow Community Helm Chart.

This warning has been extended recently (not yet published):

https://github.com/apache/airflow/blob/main/docs/apache-airflow/start/docker.rst

Customizing the quick-start Docker Compose

DO NOT attempt to customize images and the Docker Compose if you do
not know exactly what you are doing, do not know Docker Compose, or
are not prepared to debug and resolve problems on your own. If you do
not know Docker Compose and expect Airflow to just work beyond
following precisely the quick-start, then please use other ways of
running Airflow - for example :doc:`/start/local` for testing and
trying and :doc:`Official Airflow Community Helm
Chart<helm-chart:index>` for production purposes.

Even if many users think of Docker Compose as "ready to use", it is
really a developer tool, that requires the user to know very well how
docker images, containers, docker compose networking, volumes, naming,
image building works. It is extremely easy to make mistakes that lead
to difficult to diagnose problems and if you are not ready to spend
your own time on learning and diagnosing and resolving those problems
on your own do not follow this path. You have been warned.

If you customize, or modify images, the compose file and see problem
do not expect you will get a lot of help with solving those problems
in the Airflow support channels. Most of the problems you will
experience are Docker Compose related problems and if you need help in
solving them, there are dedicated channels in Docker Compose that you
can use.


J.

On Thu, Mar 31, 2022 at 11:51 PM Bob Van <[email protected]> wrote:
>
> I found that the scheduler was filling the log with a permission denied 
> errors where user 50000 had apparently taken over the log directory, so was 
> able to get it working again with a chmod -R 777.
>
>
>
> In one test I didn’t get ‘unsupported pickle protocol’, but could click into 
> the failures in the UI see logged info. However in the console I would see – 
> “Celery command failed on host: 5b549d6501cd” along with references to py3.7 
> libs. And also “Dag 'Etl' could not be found; either it does not exist or it 
> failed to parse.”
>
>
>
> After another retry, I’m back to the ‘unsupported pickle protocol: 5’ errors
>
>
>
>
>
>
>
> airflow-worker_1     | [2022-03-31 21:18:08,599: ERROR/ForkPoolWorker-15] 
> Failed to execute task Dag 'Etl' could not be found; either it does not exist 
> or it failed to parse..
>
>
>
> airflow-worker_1     | Traceback (most recent call last):
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
>  line 121, in _execute_in_fork
>
> airflow-worker_1     |     args.func(args)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", 
> line 48, in command
>
> airflow-worker_1     |     return func(*args, **kwargs)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 
> 92, in wrapper
>
> airflow-worker_1     |     return f(*args, **kwargs)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
>  line 282, in task_run
>
> airflow-worker_1     |     dag = get_dag(args.subdir, args.dag_id)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 
> 193, in get_dag
>
>
>
> airflow-worker_1     |     f"Dag {dag_id!r} could not be found; either it 
> does not exist or it failed to parse."
>
>
>
> airflow-worker_1     | airflow.exceptions.AirflowException: Dag 'Etl' could 
> not be found; either it does not exist or it failed to parse.
>
>
>
> airflow-worker_1     | [2022-03-31 21:18:08,609: ERROR/ForkPoolWorker-15] 
> Task 
> airflow.executors.celery_executor.execute_command[ba1eaa98-a098-4b71-9282-4a7f601a088e]
>  raised unexpected: AirflowException('Celery command failed on host: 
> 5b549d6501cd')
>
>
>
> airflow-worker_1     | Traceback (most recent call last):
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line 
> 451, in trace_task
>
> airflow-worker_1     |     R = retval = fun(*args, **kwargs)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line 
> 734, in __protected_call__
>
> airflow-worker_1     |     return self.run(*args, **kwargs)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
>  line 90, in execute_command
>
> airflow-worker_1     |     _execute_in_fork(command_to_exec, celery_task_id)
>
> airflow-worker_1     |   File 
> "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
>  line 101, in _execute_in_fork
>
> airflow-worker_1     |     raise AirflowException('Celery command failed on 
> host: ' + get_hostname())
>
> airflow-worker_1     | airflow.exceptions.AirflowException: Celery command 
> failed on host: 5b549d6501cd
>
>
>
> airflow-scheduler_1  | [2022-03-31 21:18:08,834] {scheduler_job.py:533} INFO 
> - Executor reports execution of Etl.get_data 
> run_id=scheduled__2022-03-30T00:00:00+00:00 exited with status failed for 
> try_number 1
>
>
>
>
>
>
>
> From: Bob Van
> Sent: Thursday, March 31, 2022 3:07 PM
> To: [email protected]
> Subject: unsupported pickle protocol?
>
>
>
> I'm interested in using airflow for some etl processing so have set it up in 
> windows wsl ubuntu and have the first example pipeline definition tutorial 
> working.
>
>
>
> For the 'ETL' example with docker, I setup docker in ubuntu and have revised 
> the docker compose file with my paths to airflow home, psql dbase connection 
> to psql running on the host win box, and the admin credentials for the 
> airflow webserver.
>
>
>
> With 'docker-compose up' I get an 'address already in use' error, since the 
> process starts the webserver, so I found I needed to shut the webserver down 
> first. I also tried shutting down the scheduler, since it also gets started, 
> but then by dag files didn't get picked up.
>
>
>
> When I run the 'ETL' dag I get an 'unsupported pickle protocol: 5' error 
> apparently because I have python 3.8, but the process is referencing 3.7 
> libraries.
>
>
>
> -- airflow info
>
> Apache Airflow
>
> version                | 2.2.4
>
> executor               | LocalExecutor
>
>
>
> python_version  | 3.8.10 (default, Mar 15 2022, 12:22:08)  [GCC 9.4.0]
>
> python_location | /usr/bin/python3
>
>
>
>
>
> -- error info
>
> Python version: 3.7.12
>
> Airflow version: 2.2.4
>
>
>
> ....
>
>   File "/home/airflow/.local/lib/python3.7/site-packages/dill/_dill.py", line 
> 472, in load
>
>     obj = StockUnpickler.load(self)
>
> ValueError: unsupported pickle protocol: 5
>
>
>
>
>
> What’s the point of installing airflow with explicit dependencies like py3.8 
> if the processes are just going to use some incompatible version?
>
>
>
> Where can I configure the python reference so airflow doesn't go to the 3.7 
> version?
>
>
>
> I ran ‘airflow standalone’ and various other things while trying to figure 
> out the setup which may be why there’s a ‘home/airflow’ with the wrong python.
>
>
>
>

Reply via email to