Ash Berlin-Taylor created AIRFLOW-1916:
------------------------------------------
Summary: S3 Task logs end up duplicated in the file.
Key: AIRFLOW-1916
URL: https://issues.apache.org/jira/browse/AIRFLOW-1916
Project: Apache Airflow
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Ash Berlin-Taylor
Fix For: 1.9.1
If using the S3TaskHandler logger the contents of the log file in the S3 bucket
end up duplicated - once from when `airflow run --raw` finalizes the task, and
again from when `airflow run --local` finalizes it's logger.
Log from the UI included below. The file on disk does not have the repetition.
There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not
meant to upload, but something is.
{noformat}
*** Reading remote log from
s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log.
[2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
[2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for
<TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
[2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for
<TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
[2017-12-13 10:45:49,906] {models.py:1407} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
[2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing
<Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
[2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash',
'-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5
--raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
[2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,622] {configuration.py:206} WARNING - section/key
[celery/celery_ssl_active] not found in config
[2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without
SSL
[2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor
[2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,754] {models.py:189} INFO - Filling up the DagBag from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
[2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
[2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None
[2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
[2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for
<TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
[2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for
<TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
[2017-12-13 10:45:49,906] {models.py:1407} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
[2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing
<Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
[2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash',
'-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5
--raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
[2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,622] {configuration.py:206} WARNING - section/key
[celery/celery_ssl_active] not found in config
[2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without
SSL
[2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor
[2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,754] {models.py:189} INFO - Filling up the DagBag from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
[2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from
/usr/local/airflow/dags/example/csv_to_parquet.py
[2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
[2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None
[2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS connection (1):
sts.amazonaws.com
[2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS connection (1):
s3.eu-west-1.amazonaws.com
[2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: [2017-12-13
10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS connection (1):
s3.eu-west-1.amazonaws.com
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)