[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195581#comment-16195581
 ] 

Allison Wang commented on AIRFLOW-1667:
---------------------------------------

I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly invoke a post_task_run method that handles any additional clean 
up/operations upon task completion. This change only requires modifying a small 
amount of current code. I am not exactly sure how the to upload the log to 
remote storage like S3/GCS periodically upon task execution, but it's possible 
to use a log collector (e.g Filebeat) to ship the log to a centralized storage 
(e.g ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -------------------------------------
>
>                 Key: AIRFLOW-1667
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: logging
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to