I found that the scheduler was filling the log with a permission denied errors 
where user 50000 had apparently taken over the log directory, so was able to 
get it working again with a chmod -R 777.

In one test I didn't get 'unsupported pickle protocol', but could click into 
the failures in the UI see logged info. However in the console I would see - 
"Celery command failed on host: 5b549d6501cd" along with references to py3.7 
libs. And also "Dag 'Etl' could not be found; either it does not exist or it 
failed to parse."

After another retry, I'm back to the 'unsupported pickle protocol: 5' errors



airflow-worker_1     | [2022-03-31 21:18:08,599: ERROR/ForkPoolWorker-15] 
Failed to execute task Dag 'Etl' could not be found; either it does not exist 
or it failed to parse..

airflow-worker_1     | Traceback (most recent call last):
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 121, in _execute_in_fork
airflow-worker_1     |     args.func(args)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", 
line 48, in command
airflow-worker_1     |     return func(*args, **kwargs)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 
92, in wrapper
airflow-worker_1     |     return f(*args, **kwargs)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py",
 line 282, in task_run
airflow-worker_1     |     dag = get_dag(args.subdir, args.dag_id)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 
193, in get_dag

airflow-worker_1     |     f"Dag {dag_id!r} could not be found; either it does 
not exist or it failed to parse."

airflow-worker_1     | airflow.exceptions.AirflowException: Dag 'Etl' could not 
be found; either it does not exist or it failed to parse.

airflow-worker_1     | [2022-03-31 21:18:08,609: ERROR/ForkPoolWorker-15] Task 
airflow.executors.celery_executor.execute_command[ba1eaa98-a098-4b71-9282-4a7f601a088e]
 raised unexpected: AirflowException('Celery command failed on host: 
5b549d6501cd')

airflow-worker_1     | Traceback (most recent call last):
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line 
451, in trace_task
airflow-worker_1     |     R = retval = fun(*args, **kwargs)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/celery/app/trace.py", line 
734, in __protected_call__
airflow-worker_1     |     return self.run(*args, **kwargs)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 90, in execute_command
airflow-worker_1     |     _execute_in_fork(command_to_exec, celery_task_id)
airflow-worker_1     |   File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 101, in _execute_in_fork
airflow-worker_1     |     raise AirflowException('Celery command failed on 
host: ' + get_hostname())
airflow-worker_1     | airflow.exceptions.AirflowException: Celery command 
failed on host: 5b549d6501cd

airflow-scheduler_1  | [2022-03-31 21:18:08,834] {scheduler_job.py:533} INFO - 
Executor reports execution of Etl.get_data 
run_id=scheduled__2022-03-30T00:00:00+00:00 exited with status failed for 
try_number 1



From: Bob Van
Sent: Thursday, March 31, 2022 3:07 PM
To: [email protected]
Subject: unsupported pickle protocol?

I'm interested in using airflow for some etl processing so have set it up in 
windows wsl ubuntu and have the first example pipeline definition tutorial 
working.

For the 'ETL' example with docker, I setup docker in ubuntu and have revised 
the docker compose file with my paths to airflow home, psql dbase connection to 
psql running on the host win box, and the admin credentials for the airflow 
webserver.

With 'docker-compose up' I get an 'address already in use' error, since the 
process starts the webserver, so I found I needed to shut the webserver down 
first. I also tried shutting down the scheduler, since it also gets started, 
but then by dag files didn't get picked up.

When I run the 'ETL' dag I get an 'unsupported pickle protocol: 5' error 
apparently because I have python 3.8, but the process is referencing 3.7 
libraries.

-- airflow info
Apache Airflow
version                | 2.2.4
executor               | LocalExecutor

python_version  | 3.8.10 (default, Mar 15 2022, 12:22:08)  [GCC 9.4.0]
python_location | /usr/bin/python3


-- error info
Python version: 3.7.12
Airflow version: 2.2.4

....
  File "/home/airflow/.local/lib/python3.7/site-packages/dill/_dill.py", line 
472, in load
    obj = StockUnpickler.load(self)
ValueError: unsupported pickle protocol: 5


What's the point of installing airflow with explicit dependencies like py3.8 if 
the processes are just going to use some incompatible version?

Where can I configure the python reference so airflow doesn't go to the 3.7 
version?

I ran 'airflow standalone' and various other things while trying to figure out 
the setup which may be why there's a 'home/airflow' with the wrong python.


Reply via email to