I found that the scheduler was filling the log with a permission denied errors
where user 50000 had apparently taken over the log directory, so was able to
get it working again with a chmod -R 777.
In one test I didn't get 'unsupported pickle protocol', but could click into
the failures in the UI see logged info. However in the console I would see -
"Celery command failed on host: 5b549d6501cd" along with references to py3.7
libs. And also "Dag 'Etl' could not be found; either it does not exist or it
failed to parse."
After another retry, I'm back to the 'unsupported pickle protocol: 5' errors
airflow-worker_1 | [2022-03-31 21:18:08,599: ERROR/ForkPoolWorker-15]
Failed to execute task Dag 'Etl' could not be found; either it does not exist
or it failed to parse..
airflow-worker_1 | Traceback (most recent call last):
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
line 121, in _execute_in_fork
airflow-worker_1 | args.func(args)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py",
line 48, in command
airflow-worker_1 | return func(*args, **kwargs)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line
92, in wrapper
airflow-worker_1 | return f(*args, **kwargs)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
line 282, in task_run
airflow-worker_1 | dag = get_dag(args.subdir, args.dag_id)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line
193, in get_dag
airflow-worker_1 | f"Dag {dag_id!r} could not be found; either it does
not exist or it failed to parse."
airflow-worker_1 | airflow.exceptions.AirflowException: Dag 'Etl' could not
be found; either it does not exist or it failed to parse.
airflow-worker_1 | [2022-03-31 21:18:08,609: ERROR/ForkPoolWorker-15] Task
airflow.executors.celery_executor.execute_command[ba1eaa98-a098-4b71-9282-4a7f601a088e]
raised unexpected: AirflowException('Celery command failed on host:
5b549d6501cd')
airflow-worker_1 | Traceback (most recent call last):
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line
451, in trace_task
airflow-worker_1 | R = retval = fun(*args, **kwargs)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line
734, in __protected_call__
airflow-worker_1 | return self.run(*args, **kwargs)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
line 90, in execute_command
airflow-worker_1 | _execute_in_fork(command_to_exec, celery_task_id)
airflow-worker_1 | File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
line 101, in _execute_in_fork
airflow-worker_1 | raise AirflowException('Celery command failed on
host: ' + get_hostname())
airflow-worker_1 | airflow.exceptions.AirflowException: Celery command
failed on host: 5b549d6501cd
airflow-scheduler_1 | [2022-03-31 21:18:08,834] {scheduler_job.py:533} INFO -
Executor reports execution of Etl.get_data
run_id=scheduled__2022-03-30T00:00:00+00:00 exited with status failed for
try_number 1
From: Bob Van
Sent: Thursday, March 31, 2022 3:07 PM
To: [email protected]
Subject: unsupported pickle protocol?
I'm interested in using airflow for some etl processing so have set it up in
windows wsl ubuntu and have the first example pipeline definition tutorial
working.
For the 'ETL' example with docker, I setup docker in ubuntu and have revised
the docker compose file with my paths to airflow home, psql dbase connection to
psql running on the host win box, and the admin credentials for the airflow
webserver.
With 'docker-compose up' I get an 'address already in use' error, since the
process starts the webserver, so I found I needed to shut the webserver down
first. I also tried shutting down the scheduler, since it also gets started,
but then by dag files didn't get picked up.
When I run the 'ETL' dag I get an 'unsupported pickle protocol: 5' error
apparently because I have python 3.8, but the process is referencing 3.7
libraries.
-- airflow info
Apache Airflow
version | 2.2.4
executor | LocalExecutor
python_version | 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
python_location | /usr/bin/python3
-- error info
Python version: 3.7.12
Airflow version: 2.2.4
....
File "/home/airflow/.local/lib/python3.7/site-packages/dill/_dill.py", line
472, in load
obj = StockUnpickler.load(self)
ValueError: unsupported pickle protocol: 5
What's the point of installing airflow with explicit dependencies like py3.8 if
the processes are just going to use some incompatible version?
Where can I configure the python reference so airflow doesn't go to the 3.7
version?
I ran 'airflow standalone' and various other things while trying to figure out
the setup which may be why there's a 'home/airflow' with the wrong python.