dcardinha opened a new issue #21265:
URL: https://github.com/apache/airflow/issues/21265


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   We are seeing some scheduler crashes after it times out processing a dag. 
   
   An example: 
   
   ```
   
   2022-02-01T07:28:12.991+00:00 | ts="2022-02-01 07:28:12,991" 
class="airflow.dag_processing.manager.DagFileProcessorAgent" level="INFO" 
msg="Launched DagFileProcessorManager with pid: 297433"
   -- | --
   
     | 2022-02-01T07:28:12.988+00:00 | ts="2022-02-01 07:28:12,988" 
class="airflow.dag_processing.manager.DagFileProcessorAgent" level="WARNING" 
msg="DagFileProcessorManager (PID=285072) exited with exit code 1 - 
re-launching"
   
     | 2022-02-01T07:28:12.778+00:00 | Process ForkProcess-40:
   
     | 2022-02-01T07:28:12.778+00:00 | Traceback (most recent call last):
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
   
     | 2022-02-01T07:28:12.778+00:00 | self.run()
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
   
     | 2022-02-01T07:28:12.778+00:00 | self._target(*self._args, **self._kwargs)
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/manager.py", 
line 287, in _run_processor_manager
   
     | 2022-02-01T07:28:12.778+00:00 | processor_manager.start()
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/manager.py", 
line 520, in start
   
     | 2022-02-01T07:28:12.778+00:00 | return self._run_parsing_loop()
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/manager.py", 
line 580, in _run_parsing_loop
   
     | 2022-02-01T07:28:12.778+00:00 | 
self._collect_results_from_processor(processor)
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/manager.py", 
line 904, in _collect_results_from_processor
   
     | 2022-02-01T07:28:12.778+00:00 | if processor.result is not None:
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/processor.py", 
line 322, in result
   
     | 2022-02-01T07:28:12.778+00:00 | if not self.done:
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/local/lib/python3.7/dist-packages/airflow/dag_processing/processor.py", 
line 287, in done
   
     | 2022-02-01T07:28:12.778+00:00 | if self._parent_channel.poll():
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/lib/python3.7/multiprocessing/connection.py", line 255, in poll
   
     | 2022-02-01T07:28:12.778+00:00 | self._check_closed()
   
     | 2022-02-01T07:28:12.778+00:00 | File 
"/usr/lib/python3.7/multiprocessing/connection.py", line 136, in _check_closed
   
     | 2022-02-01T07:28:12.778+00:00 | raise OSError("handle is closed")
   
     | 2022-02-01T07:28:12.778+00:00 | OSError: handle is closed
   
     | 2022-02-01T07:28:10.985+00:00 | ts="2022-02-01 07:28:10,985" 
class="airflow.dag_processing.processor.DagFileProcessorProcess" 
level="WARNING" msg="Killing DAGFileProcessorProcess (PID=296516)"
   
     | 2022-02-01T07:28:10.979+00:00 | ts="2022-02-01 07:28:10,979" 
class="airflow.processor_manager" level="ERROR" msg="Processor for 
/srv/app/dags/bi_lookml/dag.py with PID 296516 started at 
2022-02-01T07:26:10.857693+00:00 has timed out, killing it
   
   ```
   
   We use airflow 2.2.3 and this is happening a couple of times per day. We are 
trying to increase the dagbag timeout seconds but it has not cleared all the 
crashes. Could you please advise?
   
   ### What you expected to happen
   
   Scheduler should not exit if a timeout happens.
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   18.04.6 LTS (Bionic Beaver)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Kubernetes
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to