André Pinto created AIRFLOW-2067:
------------------------------------

             Summary: Scheduler abruptly failing tasks
                 Key: AIRFLOW-2067
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2067
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler, worker
    Affects Versions: 1.9.0
            Reporter: André Pinto


We have a massive DAG with hundreds of tasks (responsible to orchestrate the 
daily conversion of all the data sets we have from JSON into Parquet). Since we 
updated to the latest version (1.9.0) we have been occasionally getting some 
apparently random failures on some of these tasks.

They happen in a way that the logs are not uploaded to S3, but looking at the 
file system I can find them. They are not very useful though. Example:
root@efe05e677183:~/airflow/logs/emr_json_to_parquet_rescheduled/conversion_delete_output_prod.consolidated_user_search_deduper.all_user_searches_dedup.user_search.Search/2018-01-25T06:00:00#
 cat 1.log
[2018-01-26 06:01:18,118] \{cli.py:374} INFO - Running on host efe05e677183
[2018-01-26 06:01:18,253] \{models.py:1197} INFO - Dependencies all met for 
<TaskInstance: 
emr_json_to_parquet_rescheduled.conversion_delete_output_prod.consolidated_user_search_deduper.all_user_searches_dedup.user_search.Search
 2018-01-25 06:00:00 [queued]>
[2018-01-26 06:01:19,202] \{models.py:1197} INFO - Dependencies all met for 
<TaskInstance: 
emr_json_to_parquet_rescheduled.conversion_delete_output_prod.consolidated_user_search_deduper.all_user_searches_dedup.user_search.Search
 2018-01-25 06:00:00 [queued]>
[2018-01-26 06:01:19,202] \{models.py:1407} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 3
--------------------------------------------------------------------------------
 

All of them are similar, as if the process was killed at the beginning without 
having time to upload the log file to S3.

Our Airflow instance is running in LocalExecutor mode.

Other smaller DAGs do not seem to experience this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to