koliankolin opened a new issue #14975:
URL: https://github.com/apache/airflow/issues/14975
**Apache Airflow version**: 2.0.1
**Kubernetes version (if you are using kubernetes)** (use `kubectl
version`): No
**Environment**:
- **Cloud provider or hardware configuration**: Local Docker 7GB RAM, 2 CPU
- **OS** (e.g. from /etc/os-release): Debian GNU/Linux
- **Kernel** (e.g. `uname -a`): Linux 97bb49fdb749 4.19.121-linuxkit
- **Install tools**: pip
- **Others**:
**What happened**:
I tried to trigger dag manually from UI which was pickled in meta DB and it
crashes.
**What you expected to happen**:
All tasks are running and executed successfully
**How to reproduce it**:
1) Pickle dag with DagPickle model. For example, if we have instance of
created dag we execute dag.pickle()
2) Trigger Dag which was pickled from UI
Using `docker logs -f name_of_container` I see next logs:
```
[2021-03-19 09:34:54,735] {scheduler_job.py:975} INFO - Figuring out tasks
to run in Pool(name=default_pool) with 128 open slots and 1 task instances
ready to be queued
[2021-03-19 09:34:54,736] {scheduler_job.py:1002} INFO - DAG dummy has 0/16
running and queued tasks
[2021-03-19 09:34:54,737] {scheduler_job.py:1063} INFO - Setting the
following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:34:54,743] {scheduler_job.py:1105} INFO - Sending
TaskInstanceKey(dag_id='dummy', task_id='init',
execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867,
tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue
default
[2021-03-19 09:34:54,744] {base_executor.py:82} INFO - Adding to queue:
['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool',
'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:34:54,749] {local_executor.py:81} INFO - QueuedLocalWorker
running ['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool',
'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
on host 49442bde5388
[2021-03-19 09:34:58,081] {scheduler_job.py:1206} INFO - Executor reports
execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited
with status success for try_number 1
[2021-03-19 09:34:58 +0000] [550] [INFO] Handling signal: ttin
[2021-03-19 09:34:58 +0000] [37648] [INFO] Booting worker with pid: 37648
[2021-03-19 09:35:05 +0000] [550] [INFO] Handling signal: ttou
[2021-03-19 09:35:05 +0000] [32497] [INFO] Worker exiting (pid: 32497)
[2021-03-19 09:35:06,926] {scheduler_job.py:941} INFO - 1 tasks up for
execution:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,929] {scheduler_job.py:975} INFO - Figuring out tasks
to run in Pool(name=default_pool) with 128 open slots and 1 task instances
ready to be queued
[2021-03-19 09:35:06,929] {scheduler_job.py:1002} INFO - DAG dummy has 0/16
running and queued tasks
[2021-03-19 09:35:06,929] {scheduler_job.py:1063} INFO - Setting the
following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,931] {scheduler_job.py:1105} INFO - Sending
TaskInstanceKey(dag_id='dummy', task_id='init',
execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867,
tzinfo=Timezone('UTC')), try_number=2) to executor with priority 7 and queue
default
[2021-03-19 09:35:06,931] {base_executor.py:82} INFO - Adding to queue:
['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool',
'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:35:06,935] {local_executor.py:81} INFO - QueuedLocalWorker
running ['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool',
'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
on host 49442bde5388
[2021-03-19 09:35:09,827] {scheduler_job.py:1206} INFO - Executor reports
execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited
with status success for try_number 2
[2021-03-19 09:35:18,884] {dagrun.py:430} ERROR - Marking run <DagRun dummy
@ 2021-03-19 09:34:53.681867+00:00: manual__2021-03-19T09:34:53.681867+00:00,
externally triggered: True> failed
```
And such kind of logs from failed task from UI:
```
*** Reading local file:
/usr/local/airflow/logs/dummy/init/4-20210319T093453/2.log
>2021-03-19 09:35:09,215|INFO|taskinstance.py:851|Dependencies all met for
<TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:851|Dependencies all met for
<TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1043|Starting attempt 2 of 2
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,247|INFO|taskinstance.py:1063|Executing
<Task(PythonOperator): init> on 2021-03-19T09:34:53.681867+00:00
>2021-03-19 09:35:09,252|INFO|standard_task_runner.py:52|Started process
38115 to run task
>2021-03-19 09:35:09,257|INFO|standard_task_runner.py:76|Running:
['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--job-id', '25',
'--pool', 'default_pool', '--raw', '--cfg-path', '/tmp/tmpo6temkwx',
'--error-file', '/tmp/tmpv5y3gphw']
>2021-03-19 09:35:09,258|INFO|standard_task_runner.py:77|Job 25: Subtask init
>2021-03-19 09:35:09,284|INFO|local_task_job.py:146|Task exited with return
code 1
```
**Anything else we need to know**:
**When I cleaned dag_pickle table in meta DB and clean last_pickle column in
table dag and don't pickle dags at all everything is working well.**
To approve this statement show logs where flag --pickle wasn't used and
everything worked successfully:
```
[2021-03-19 10:22:15,074] {scheduler_job.py:941} INFO - 1 tasks up for
execution:
<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,080] {scheduler_job.py:975} INFO - Figuring out tasks
to run in Pool(name=default_pool) with 128 open slots and 1 task instances
ready to be queued
[2021-03-19 10:22:15,082] {scheduler_job.py:1002} INFO - DAG dummy has 0/16
running and queued tasks
[2021-03-19 10:22:15,085] {scheduler_job.py:1063} INFO - Setting the
following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,089] {scheduler_job.py:1105} INFO - Sending
TaskInstanceKey(dag_id='dummy', task_id='init',
execution_date=datetime.datetime(2021, 3, 19, 10, 21, 11, 696790,
tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue
default
[2021-03-19 10:22:15,090] {base_executor.py:82} INFO - Adding to queue:
['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool',
'--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,096] {local_executor.py:81} INFO - QueuedLocalWorker
running ['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool',
'--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,320] {dagbag.py:448} INFO - Filling up the DagBag from
/usr/local/airflow/dags/dag_generator.py
Running <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
on host 49442bde5388
[2021-03-19 10:22:18,384] {scheduler_job.py:941} INFO - 2 tasks up for
execution:
<TaskInstance: dummy.df_now1 2021-03-19 10:21:11.696790+00:00
[scheduled]>
<TaskInstance: dummy.df_now2 2021-03-19 10:21:11.696790+00:00
[scheduled]>
[2021-03-19 10:22:18,386] {scheduler_job.py:975} INFO - Figuring out tasks
to run in Pool(name=default_pool) with 128 open slots and 2 task instances
ready to be queued
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 0/16
running and queued tasks
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 1/16
running and queued tasks
```
And also logs from UI about this task:
```
*** Reading local file:
/usr/local/airflow/logs/dummy/init/5-20210319T102111/1.log
>2021-03-19 10:22:17,638|INFO|taskinstance.py:851|Dependencies all met for
<TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:851|Dependencies all met for
<TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1043|Starting attempt 1 of 2
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,669|INFO|taskinstance.py:1063|Executing
<Task(PythonOperator): init> on 2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,674|INFO|standard_task_runner.py:52|Started process
6413 to run task
>2021-03-19 10:22:17,682|INFO|standard_task_runner.py:76|Running:
['airflow', 'tasks', 'run', 'dummy', 'init',
'2021-03-19T10:21:11.696790+00:00', '--job-id', '26', '--pool', 'default_pool',
'--raw', '--subdir', 'DAGS_FOLDER/dag_generator.py', '--cfg-path',
'/tmp/tmpqkmkygyb', '--error-file', '/tmp/tmp2k6izpaa']
>2021-03-19 10:22:17,683|INFO|standard_task_runner.py:77|Job 26: Subtask init
>2021-03-19 10:22:17,733|INFO|logging_mixin.py:104|Running <TaskInstance:
dummy.init 2021-03-19T10:21:11.696790+00:00 [running]> on host 49442bde5388
>2021-03-19 10:22:17,777|INFO|taskinstance.py:1257|Exporting the following
env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dummy
AIRFLOW_CTX_TASK_ID=init
AIRFLOW_CTX_EXECUTION_DATE=2021-03-19T10:21:11.696790+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,778|INFO|python.py:118|Done. Returned value was: None
>2021-03-19 10:22:17,787|INFO|taskinstance.py:1166|Marking task as SUCCESS.
dag_id=dummy, task_id=init, execution_date=20210319T102111,
start_date=20210319T102217, end_date=20210319T102217
>2021-03-19 10:22:17,818|INFO|taskinstance.py:1220|2 downstream tasks
scheduled from follow-on schedule check
>2021-03-19 10:22:17,853|INFO|local_task_job.py:146|Task exited with return
code 0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]