First of all - please do not use the devlist for troubleshooting. There are better channels for that, slack, Github Issues, github discussions. See https://airflow.apache.org/community/.
Secondly - you probably set too high of expectations you have from Quick Starts. If you want something that works out-of-the box and you can manage it easily - use Helm Chart and Kubernetes. Also do not forget this is a free software and basically you get what you pay for. If you want to have managed service - there are companies that offer Airflow Managed as a Service. Otherwise you need to setup and configure it yourself. If you are not capable of solving some basic (typical) problems like multiple versions of python - I recommend you to go the route of managed service. All the options how you can install and manage Airflow and what kind of expectations are from you as a user can be found here https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html and I recommend you to read it before you make next steps. You likely missed the warning in https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html - Docker Compose is just a quick start. And if you mess with your installation, you need to fix it on your own. Quick Start is for a playground for developers and people who want to play with it and anything you do there is not meant to be persistent. If you made some mistakes and mixed different versions of Python you should fix it on your own. I recommend you to wipe everything out and start from scratch and make sure you experiment in separate virtualenv/users if you are unsure what you do (which is pretty normal when you experiment first) The warning is here: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#production-readiness DO NOT expect the Docker Compose below will be enough to run production-ready Docker Compose Airflow installation using it. This is truly quick-start docker-compose for you to get Airflow up and running locally and get your hands dirty with Airflow. Configuring a Docker-Compose installation that is ready for production requires an intrinsic knowledge of Docker Compose, a lot of customization and possibly even writing the Docker Compose file that will suit your needs from the scratch. It’s probably OK if you want to run Docker Compose-based deployment, but short of becoming a Docker Compose expert, it’s highly unlikely you will get robust deployment with it If you want to get an easy to configure Docker-based deployment that Airflow Community develops, supports and can provide support with deployment, you should consider using Kubernetes and deploying Airflow using Official Airflow Community Helm Chart. This warning has been extended recently (not yet published): https://github.com/apache/airflow/blob/main/docs/apache-airflow/start/docker.rst Customizing the quick-start Docker Compose DO NOT attempt to customize images and the Docker Compose if you do not know exactly what you are doing, do not know Docker Compose, or are not prepared to debug and resolve problems on your own. If you do not know Docker Compose and expect Airflow to just work beyond following precisely the quick-start, then please use other ways of running Airflow - for example :doc:`/start/local` for testing and trying and :doc:`Official Airflow Community Helm Chart<helm-chart:index>` for production purposes. Even if many users think of Docker Compose as "ready to use", it is really a developer tool, that requires the user to know very well how docker images, containers, docker compose networking, volumes, naming, image building works. It is extremely easy to make mistakes that lead to difficult to diagnose problems and if you are not ready to spend your own time on learning and diagnosing and resolving those problems on your own do not follow this path. You have been warned. If you customize, or modify images, the compose file and see problem do not expect you will get a lot of help with solving those problems in the Airflow support channels. Most of the problems you will experience are Docker Compose related problems and if you need help in solving them, there are dedicated channels in Docker Compose that you can use. J. On Thu, Mar 31, 2022 at 11:51 PM Bob Van <[email protected]> wrote: > > I found that the scheduler was filling the log with a permission denied > errors where user 50000 had apparently taken over the log directory, so was > able to get it working again with a chmod -R 777. > > > > In one test I didn’t get ‘unsupported pickle protocol’, but could click into > the failures in the UI see logged info. However in the console I would see – > “Celery command failed on host: 5b549d6501cd” along with references to py3.7 > libs. And also “Dag 'Etl' could not be found; either it does not exist or it > failed to parse.” > > > > After another retry, I’m back to the ‘unsupported pickle protocol: 5’ errors > > > > > > > > airflow-worker_1 | [2022-03-31 21:18:08,599: ERROR/ForkPoolWorker-15] > Failed to execute task Dag 'Etl' could not be found; either it does not exist > or it failed to parse.. > > > > airflow-worker_1 | Traceback (most recent call last): > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", > line 121, in _execute_in_fork > > airflow-worker_1 | args.func(args) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", > line 48, in command > > airflow-worker_1 | return func(*args, **kwargs) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line > 92, in wrapper > > airflow-worker_1 | return f(*args, **kwargs) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", > line 282, in task_run > > airflow-worker_1 | dag = get_dag(args.subdir, args.dag_id) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line > 193, in get_dag > > > > airflow-worker_1 | f"Dag {dag_id!r} could not be found; either it > does not exist or it failed to parse." > > > > airflow-worker_1 | airflow.exceptions.AirflowException: Dag 'Etl' could > not be found; either it does not exist or it failed to parse. > > > > airflow-worker_1 | [2022-03-31 21:18:08,609: ERROR/ForkPoolWorker-15] > Task > airflow.executors.celery_executor.execute_command[ba1eaa98-a098-4b71-9282-4a7f601a088e] > raised unexpected: AirflowException('Celery command failed on host: > 5b549d6501cd') > > > > airflow-worker_1 | Traceback (most recent call last): > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line > 451, in trace_task > > airflow-worker_1 | R = retval = fun(*args, **kwargs) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line > 734, in __protected_call__ > > airflow-worker_1 | return self.run(*args, **kwargs) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", > line 90, in execute_command > > airflow-worker_1 | _execute_in_fork(command_to_exec, celery_task_id) > > airflow-worker_1 | File > "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", > line 101, in _execute_in_fork > > airflow-worker_1 | raise AirflowException('Celery command failed on > host: ' + get_hostname()) > > airflow-worker_1 | airflow.exceptions.AirflowException: Celery command > failed on host: 5b549d6501cd > > > > airflow-scheduler_1 | [2022-03-31 21:18:08,834] {scheduler_job.py:533} INFO > - Executor reports execution of Etl.get_data > run_id=scheduled__2022-03-30T00:00:00+00:00 exited with status failed for > try_number 1 > > > > > > > > From: Bob Van > Sent: Thursday, March 31, 2022 3:07 PM > To: [email protected] > Subject: unsupported pickle protocol? > > > > I'm interested in using airflow for some etl processing so have set it up in > windows wsl ubuntu and have the first example pipeline definition tutorial > working. > > > > For the 'ETL' example with docker, I setup docker in ubuntu and have revised > the docker compose file with my paths to airflow home, psql dbase connection > to psql running on the host win box, and the admin credentials for the > airflow webserver. > > > > With 'docker-compose up' I get an 'address already in use' error, since the > process starts the webserver, so I found I needed to shut the webserver down > first. I also tried shutting down the scheduler, since it also gets started, > but then by dag files didn't get picked up. > > > > When I run the 'ETL' dag I get an 'unsupported pickle protocol: 5' error > apparently because I have python 3.8, but the process is referencing 3.7 > libraries. > > > > -- airflow info > > Apache Airflow > > version | 2.2.4 > > executor | LocalExecutor > > > > python_version | 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] > > python_location | /usr/bin/python3 > > > > > > -- error info > > Python version: 3.7.12 > > Airflow version: 2.2.4 > > > > .... > > File "/home/airflow/.local/lib/python3.7/site-packages/dill/_dill.py", line > 472, in load > > obj = StockUnpickler.load(self) > > ValueError: unsupported pickle protocol: 5 > > > > > > What’s the point of installing airflow with explicit dependencies like py3.8 > if the processes are just going to use some incompatible version? > > > > Where can I configure the python reference so airflow doesn't go to the 3.7 > version? > > > > I ran ‘airflow standalone’ and various other things while trying to figure > out the setup which may be why there’s a ‘home/airflow’ with the wrong python. > > > >
