jason810496 commented on issue #45079:
URL: https://github.com/apache/airflow/issues/45079#issuecomment-2556220144

   > > Yes. That's exactly how I envisioned solving this problem. @dstandish ?
   > 
   > IIRC this should be fine when task done but may present challenges when 
task is in flight because at any moment the location of the logs may shift eg 
from worker to remote storage etc
   
   Taking `S3TaskHandler` as an example, it requires additional refactoring and 
might need a `read_stream` method added to `S3Hook` that returns a 
generator-based result:  
   
https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/amazon/aws/log/s3_task_handler.py#L136-L192
   
   From my perspective, for the `s3_write` case, I would merge the old log 
stream with the new log stream, flush the result into a temporary file, and use 
the 
[`upload_file`](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html)
 method to upload the file. This approach would help prevent memory starvation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to