[ 
https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297348#comment-16297348
 ] 

ASF subversion and git services commented on AIRFLOW-1916:
----------------------------------------------------------

Commit 4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd in incubator-airflow's branch 
refs/heads/master from [~ashb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4ce4faa ]

[AIRFLOW-1916] Don't upload logs to remote from `run --raw`

In a previous change we removed the
airflow.task.raw handler (which
printed to stdout directly) and replaced it with
one that wrote to the
log file itself. The problem comes that python
automatically calls
`logging.shutdown()` itself on process clean exit.
This ended up
uploading the log file twice: once from the end of
`airflow run --raw`,
and then again from the explicit shutdown() call
at the end of cli's
`run()`

Since logging is automatically shutdown this
change adds and explicit
flag to control if the GC and S3 handlers should
upload the file or not,
and we tell them not to when running with `--raw`

Closes #2880 from ashb/AIRFLOW-1916-dont-upload-
logs-twice


> S3 Task logs end up duplicated in the file.
> -------------------------------------------
>
>                 Key: AIRFLOW-1916
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1916
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Ash Berlin-Taylor
>             Fix For: 1.9.1, 2.0.0
>
>
> If using the S3TaskHandler logger the contents of the log file in the S3 
> bucket end up duplicated - once from when `airflow run --raw` finalizes the 
> task, and again from when `airflow run --local` finalizes it's logger.
> Log from the UI included below. The file on disk does not have the repetition.
> There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not 
> meant to upload, but something is.
> {noformat}
> *** Reading remote log from 
> s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log.
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing 
> <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 
> --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for 
> <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing 
> <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', 
> '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 
> --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key 
> [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor 
> will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor 
> CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): sts.amazonaws.com
> [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): s3.eu-west-1.amazonaws.com
> [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): s3.eu-west-1.amazonaws.com
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to