[ https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297348#comment-16297348 ]
ASF subversion and git services commented on AIRFLOW-1916: ---------------------------------------------------------- Commit 4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd in incubator-airflow's branch refs/heads/master from [~ashb] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4ce4faa ] [AIRFLOW-1916] Don't upload logs to remote from `run --raw` In a previous change we removed the airflow.task.raw handler (which printed to stdout directly) and replaced it with one that wrote to the log file itself. The problem comes that python automatically calls `logging.shutdown()` itself on process clean exit. This ended up uploading the log file twice: once from the end of `airflow run --raw`, and then again from the explicit shutdown() call at the end of cli's `run()` Since logging is automatically shutdown this change adds and explicit flag to control if the GC and S3 handlers should upload the file or not, and we tell them not to when running with `--raw` Closes #2880 from ashb/AIRFLOW-1916-dont-upload- logs-twice > S3 Task logs end up duplicated in the file. > ------------------------------------------- > > Key: AIRFLOW-1916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1916 > Project: Apache Airflow > Issue Type: Bug > Affects Versions: 1.9.0 > Reporter: Ash Berlin-Taylor > Fix For: 1.9.1, 2.0.0 > > > If using the S3TaskHandler logger the contents of the log file in the S3 > bucket end up duplicated - once from when `airflow run --raw` finalizes the > task, and again from when `airflow run --local` finalizes it's logger. > Log from the UI included below. The file on disk does not have the repetition. > There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not > meant to upload, but something is. > {noformat} > *** Reading remote log from > s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log. > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing > <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', > '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 > --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key > [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor > will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor > CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value > was: None > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]> > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing > <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', > '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 > --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key > [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor > will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor > CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value > was: None > [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): sts.amazonaws.com > [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): s3.eu-west-1.amazonaws.com > [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): s3.eu-west-1.amazonaws.com > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)