trlopes1974 commented on issue #39717: URL: https://github.com/apache/airflow/issues/39717#issuecomment-2263227474
@potiuk . I have managed to get some debug into this. Maybe this can help??? So we are talking about the problem where the CELERY task is marked as failed but the AIRFLOW task is successful Celery Task:  Airflow Task.  Airflow Task LOG: ``` 1 *** Found local files: *** * /opt/tkapp/airflow/logs/dag_id=CS00007002_Correcao_Dados_Oracle/run_id=CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00/task_id=remote_actions.ssh_command_remove_operator/attempt=1.log [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:57} DEBUG - Vault:[https://VAULT.SERVER.NAME], namespace:[admin/SOMENAMESPACE], proxy:[{"http": "http://PROXYSERVER:443" ,"https": "http://PROXYSERVER:443"}] [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:85} WARNING - a inicializar com memcache: cachehost:memcache.SERVERHOSTNAME.knb , cacheport:11211 , cachettl:2592000 , cachecrt:/opt/tkapp/ssl/SERVERHOSTNAME_knb_combined.pem [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:144} DEBUG - performing authentication to: https://VAULT.SERVER.NAME/v1/auth/approle/login -> role autentication: roleid:36 secret:36 result:640 [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:164} DEBUG - Got token:110 [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:192} DEBUG - Cached secret config/sql_alchemy_conn: 2 [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:57} DEBUG - Vault:[https://VAULT.SERVER.NAME], namespace:[admin/SOMENAMESPACE], proxy:[{"http": "http://PROXYSERVER:443" ,"https": "http://PROXYSERVER:443"}] [2024-08-01, 12:35:46 WEST] {hashicorpvaultapi.py:85} WARNING - a inicializar com memcache: cachehost:memcache.SERVERHOSTNAME.knb , cacheport:11211 , cachettl:2592000 , cachecrt:/opt/tkapp/ssl/SERVERHOSTNAME_knb_combined.pem [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:144} DEBUG - performing authentication to: https://VAULT.SERVER.NAME/v1/auth/approle/login -> role autentication: roleid:36 secret:36 result:640 [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:164} DEBUG - Got token:110 [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:192} DEBUG - Cached secret config/result_backend: 2 [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:57} DEBUG - Vault:[https://VAULT.SERVER.NAME], namespace:[admin/SOMENAMESPACE], proxy:[{"http": "http://PROXYSERVER:443" ,"https": "http://PROXYSERVER:443"}] [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:85} WARNING - a inicializar com memcache: cachehost:memcache.SERVERHOSTNAME.knb , cacheport:11211 , cachettl:2592000 , cachecrt:/opt/tkapp/ssl/SERVERHOSTNAME_knb_combined.pem [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:144} DEBUG - performing authentication to: https://VAULT.SERVER.NAME/v1/auth/approle/login -> role autentication: roleid:36 secret:36 result:640 [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:164} DEBUG - Got token:110 [2024-08-01, 12:35:47 WEST] {hashicorpvaultapi.py:192} DEBUG - Cached secret config/broker_url: 2 [2024-08-01, 12:35:47 WEST] {local_task_job_runner.py:120} ▼ Pre task execution logs [2024-08-01, 12:35:47 WEST] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: CS00007002_Correcao_Dados_Oracle.remote_actions.ssh_command_remove_operator CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00 [queued]> [2024-08-01, 12:35:47 WEST] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: CS00007002_Correcao_Dados_Oracle.remote_actions.ssh_command_remove_operator CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00 [queued]> [2024-08-01, 12:35:47 WEST] {taskinstance.py:2306} INFO - Starting attempt 1 of 1 [2024-08-01, 12:35:47 WEST] {taskinstance.py:2330} INFO - Executing <Task(SSHOperator): remote_actions.ssh_command_remove_operator> on 2024-08-01 11:34:08.393962+00:00 [2024-08-01, 12:35:47 WEST] {standard_task_runner.py:63} INFO - Started process 2987081 to run task [2024-08-01, 12:35:47 WEST] {standard_task_runner.py:96} INFO - Running: ['airflow', 'tasks', 'run', 'CS00007002_Correcao_Dados_Oracle', 'remote_actions.ssh_command_remove_operator', 'CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', '--job-id', '515545', '--raw', '--subdir', 'DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py', '--cfg-path', '/tmp/tmpdx2hhjj6'] [2024-08-01, 12:35:47 WEST] {standard_task_runner.py:97} INFO - Job 515545: Subtask remote_actions.ssh_command_remove_operator [2024-08-01, 12:35:47 WEST] {task_command.py:426} INFO - Running <TaskInstance: CS00007002_Correcao_Dados_Oracle.remote_actions.ssh_command_remove_operator CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00 [running]> on host SERVERHOSTNAME [2024-08-01, 12:35:48 WEST] {taskinstance.py:2648} INFO - Exporting env vars: AIRFLOW_CTX_DAG_EMAIL='cs-notificacoes-criticas-prod - PT Tribo2 Novobanco <[email protected]>' AIRFLOW_CTX_DAG_OWNER='ttauto' AIRFLOW_CTX_DAG_ID='CS00007002_Correcao_Dados_Oracle' AIRFLOW_CTX_TASK_ID='remote_actions.ssh_command_remove_operator' AIRFLOW_CTX_EXECUTION_DATE='2024-08-01T11:34:08.393962+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00' [2024-08-01, 12:35:48 WEST] {taskinstance.py:430} ▲▲▲ Log group end [2024-08-01, 12:35:48 WEST] {ssh.py:151} INFO - Creating ssh_client [2024-08-01, 12:35:48 WEST] {ssh.py:302} WARNING - No Host Key Verification. This won't protect against Man-In-The-Middle attacks [2024-08-01, 12:35:48 WEST] {transport.py:1909} INFO - Connected (version 2.0, client OpenSSH_8.0) [2024-08-01, 12:35:48 WEST] {transport.py:1909} INFO - Auth banner: b"################################### AVISO #########################################\n\nOs sistemas internos SOME NAME so' devem ser usados para realizar atividades de \nnegocio do SOME NAME ou outros fins autorizados pela Direcao do SOME NAME \xe2\x80\x93 \nDepartamento de Sistemas de Informacao (DSI)\n\n===============================================================================\n= O DIREITO DE ACESSO A ESTE SISTEMA E' RESERVADO !! =\n===============================================================================\n= =\n= Este sistema deve ser utilizado apenas em actividades de negocio =\n= autorizadas pela Gestao do SOME NAME (DSI). =\n= =\n===================================================== ==========================\n= Este sistema esta' sujeito a auditorias efectuadas a qualquer momento. =\n===============================================================================\n" [2024-08-01, 12:35:48 WEST] {transport.py:1909} INFO - Authentication (publickey) successful! [2024-08-01, 12:35:48 WEST] {ssh.py:483} INFO - Running command: /usr/bin/sudo -s -- eval '/usr/bin/su - oradb -c "/home/oradb/correcao_dados/correcao_dados.sh WO0000000813598 REMOVE"' [2024-08-01, 12:35:48 WEST] {ssh.py:529} INFO - a remover /tmp/correcao_dados/WO0000000813598 [2024-08-01, 12:35:48 WEST] {ssh.py:529} INFO - ACCAO 'REMOVE' EXECUTADA PARA A WORKORDERID:'WO0000000813598' [2024-08-01, 12:35:48 WEST] {taskinstance.py:441} ▼ Post task execution logs [2024-08-01, 12:35:48 WEST] {taskinstance.py:1206} INFO - Marking task as SUCCESS. dag_id=CS00007002_Correcao_Dados_Oracle, task_id=remote_actions.ssh_command_remove_operator, run_id=CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00, execution_date=20240801T113408, start_date=20240801T113547, end_date=20240801T113548 [2024-08-01, 12:35:48 WEST] {logging_mixin.py:188} INFO - log_config - locating: LOGLEVEL / modules.dagutililies [2024-08-01, 12:35:48 WEST] {dagutililies.py:5} WARNING - current log level for modules.dagutililies is: DEBUG [2024-08-01, 12:35:48 WEST] {hashicorpvaultapi.py:192} DEBUG - Cached secret variables/RABBIT: 9 [2024-08-01, 12:35:48 WEST] {dagutililies.py:159} INFO - *********** SUCCESS CALLBACK START *********** [2024-08-01, 12:35:48 WEST] {hashicorpvaultapi.py:192} DEBUG - Cached secret variables/TKAPP: 10 [2024-08-01, 12:35:48 WEST] {dagutililies.py:170} INFO - *********** SUCCESS CALLBACK END *********** [2024-08-01, 12:35:48 WEST] {local_task_job_runner.py:240} INFO - Task exited with return code 0 [2024-08-01, 12:35:48 WEST] {local_task_job_runner.py:222} ▲▲▲ Log group end ``` **Extra debug info on **celery_executor_utils.py**** ``` [2ce8b927-3f7b-42b9-831f-8c44b13b5eaa] Executing command in Celery: ['airflow', 'tasks', 'run', 'CS00007002_Correcao_Dados_Oracle', 'remote_actions.ssh_command_remove_operator', 'CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', '--local', '--subdir', 'DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py'] Execute command: ['airflow', 'tasks', 'run', 'CS00007002_Correcao_Dados_Oracle', 'remote_actions.ssh_command_remove_operator', 'CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', '--local', '--subdir', 'DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py'] for celery_task_id:2ce8b927-3f7b-42b9-831f-8c44b13b5eaa Execute _execute_in_fork: ['airflow', 'tasks', 'run', 'CS00007002_Correcao_Dados_Oracle', 'remote_actions.ssh_command_remove_operator', 'CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', '--local', '--subdir', 'DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py'] for celery_task_id:2ce8b927-3f7b-42b9-831f-8c44b13b5eaa got PID:2987051 fork pid: 2987051 - celery_task_id:2ce8b927-3f7b-42b9-831f-8c44b13b5eaa Execute _execute_in_fork: ['airflow', 'tasks', 'run', 'CS00007002_Correcao_Dados_Oracle', 'remote_actions.ssh_command_remove_operator', 'CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', '--local', '--subdir', 'DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py'] for celery_task_id:2ce8b927-3f7b-42b9-831f-8c44b13b5eaa got PID:0 calling func 'task_run' with args Namespace(subcommand='run', dag_id='CS00007002_Correcao_Dados_Oracle', task_id='remote_actions.ssh_command_remove_operator', execution_date_or_run_id='CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', cfg_path=None, depends_on_past='check', force=False, ignore_all_dependencies=False, ignore_dependencies=False, ignore_depends_on_past=False, interactive=False, job_id=None, local=True, map_index=-1, mark_success=False, shut_down_logging=False, pickle=None, pool=None, raw=False, read_from_db=False, ship_dag=False, subdir='DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py', verbose=False, func=<function lazy_load_command.<locals>.command at 0x7fb74870b550>, external_executor_id='2ce8b927-3f7b-42b9-831f-8c44b13b5eaa') Execute _execute_in_fork: calling func 'task_run' with args Namespace(subcommand='run', dag_id='CS00007002_Correcao_Dados_Oracle', task_id='remote_actions.ssh_command_remove_operator', execution_date_or_run_id='CS00007002_Correcao_Dados_Oracle_WO0000000813598_2024-08-01T12:34:08.391542+01:00', cfg_path=None, depends_on_past='check', force=False, ignore_all_dependencies=False, ignore_dependencies=False, ignore_depends_on_past=False, interactive=False, job_id=None, local=True, map_index=-1, mark_success=False, shut_down_logging=False, pickle=None, pool=None, raw=False, read_from_db=False, ship_dag=False, subdir='DAGS_FOLDER/CS00007002_Correcao_Dados_Oracle.py', verbose=False, func=<function lazy_load_command.<locals>.command at 0x7fb74870b550>, external_executor_id='2ce8b927-3f7b-42b9-831f-8c44b13b5eaa') [2ce8b927-3f7b-42b9-831f-8c44b13b5eaa] Failed to execute task. [2ce8b927-3f7b-42b9-831f-8c44b13b5eaa] Failed to execute task. Traceback (most recent call last): File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 194, in _execute_in_fork args.func(args) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/cli/cli_config.py", line 49, in command return func(*args, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/utils/cli.py", line 114, in wrapper return f(*args, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 441, in task_run task_return_code = _run_task_by_selected_method(args, _dag, ti) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 219, in _run_task_by_selected_method return _run_task_by_local_task_job(args, ti) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 281, in _run_task_by_local_task_job ret = run_job(job=job_runner.job, execute_callable=job_runner._execute) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/utils/session.py", line 79, in wrapper return func(*args, session=session, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/jobs/job.py", line 402, in run_job return execute_job(job, execute_callable=execute_callable) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/jobs/job.py", line 431, in execute_job ret = execute_callable() File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/jobs/local_task_job_runner.py", line 201, in _execute self.handle_task_exit(return_code) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/jobs/local_task_job_runner.py", line 244, in handle_task_exit self.task_instance.schedule_downstream_tasks(max_tis_per_query=self.job.max_tis_per_query) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/utils/session.py", line 79, in wrapper return func(*args, session=session, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/taskinstance.py", line 3523, in schedule_downstream_tasks return TaskInstance._schedule_downstream_tasks( File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/api_internal/internal_api_call.py", line 115, in wrapper return func(*args, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/utils/session.py", line 76, in wrapper return func(*args, **kwargs) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/taskinstance.py", line 3473, in _schedule_downstream_tasks partial_dag = task.dag.partial_subset( File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/dag.py", line 2550, in partial_subset dag.task_dict = { File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/dag.py", line 2551, in <dictcomp> t.task_id: _deepcopy_task(t) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/dag.py", line 2548, in _deepcopy_task return copy.deepcopy(t, memo) File "/usr/lib64/python3.9/copy.py", line 153, in deepcopy y = copier(memo) File "/opt/tkapp/env_airflow/lib64/python3.9/site-packages/airflow/models/baseoperator.py", line 1307, in __deepcopy__ setattr(result, k, copy.deepcopy(v, memo)) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib64/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib64/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib64/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib64/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/usr/lib64/python3.9/copy.py", line 270, in _reconstruct state = deepcopy(state, memo) File "/usr/lib64/python3.9/copy.py", line 146, in deepcopy y = copier(x, memo) File "/usr/lib64/python3.9/copy.py", line 230, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/usr/lib64/python3.9/copy.py", line 161, in deepcopy rv = reductor(4) TypeError: cannot pickle '_cffi_backend.FFI' object waitpid PID/RET: 2987051/256 Execute _execute_in_fork: waitpid PID/RET: 2987051/256 Execute _execute_in_fork: Celery command failed on host: slautop02 with celery_task_id 2ce8b927-3f7b-42b9-831f-8c44b13b5eaa (PID: 2987051, Return Code: 256) Execute command ERROR:Celery command failed on host: slautop02 with celery_task_id 2ce8b927-3f7b-42b9-831f-8c44b13b5eaa (PID: 2987051, Return Code: 256) ERROR executing: Celery command failed on host: slautop02 with celery_task_id 2ce8b927-3f7b-42b9-831f-8c44b13b5eaa (PID: 2987051, Return Code: 256) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
