[ 
https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290900#comment-16290900
 ] 

Ash Berlin-Taylor commented on AIRFLOW-1916:
--------------------------------------------

Doesn't help, as it's always called from an {{atexit}} handler registered why 
the logging stdlib module is imported.

> S3 Task logs end up duplicated in the file.
> -------------------------------------------
>
>                 Key: AIRFLOW-1916
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1916
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Ash Berlin-Taylor
>             Fix For: 1.9.1
>
>
> If using the S3TaskHandler logger the contents of the log file in the S3 
> bucket end up duplicated - once from when `airflow run --raw` finalizes the 
> task, and again from when `airflow run --local` finalizes it's logger.
> Log from the UI included below. The file on disk does not have the repetition.
> There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not 
> meant to upload, but something is.
> {noformat}
> *** Reading remote log from 
> s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log.
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing 
> <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 
> --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing 
> <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 
> --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): sts.amazonaws.com
> [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): s3.eu-west-1.amazonaws.com
> [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): s3.eu-west-1.amazonaws.com
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to