laserpedro edited a comment on issue #18041: URL: https://github.com/apache/airflow/issues/18041#issuecomment-934113162
@stephenonethree : https://stackoverflow.com/questions/65380492/why-are-my-airflow-tasks-being-externally-set-to-failed/65380493#65380493 have you checked this by chance ? In my case I have some pattern of failures: Case 1: a Task in the dag takes some time to finish (because it is doing some computations or inserting a large amount of data in a db) and the execution time is >= heartbeat signal. After having incorporated [this patch](https://github.com/apache/airflow/pull/16289/files) that was supposed to fix this issue I was still getting this error. The CPU usage was low both on the scheduler and on postgres, therefore not resource related ... After checking I found [this](https://stackoverflow.com/questions/65380492/why-are-my-airflow-tasks-being-externally-set-to-failed/65380493#65380493) on stackoveflow and adjusted my config so that now: ``` scheduler_heartbeat_sec = 200 scheduler_health_check_threshold = 600 ``` I have relaunched the dags that were long to process (by long I mean exec time > heartbeat interval) and for the moment I have not received any SIGTERM signal. Case 2: a inherited class of BaseOperator was hammering the scheduler by using a `poke_interval` < 1 min whereas it is not recommended at all by the official documentation when used in `poke` mode. By fixing the interval on the sensors and modifiying the config and incorporating the fix I finally seem to have somehting that looks stable using airflow > 2.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
