pippo995 opened a new pull request, #62078:
URL: https://github.com/apache/airflow/pull/62078

   ## Summary
   
   - `S3Hook.download_file()` writes S3 object content to a file via 
`download_fileobj()` but never calls `flush()` before returning the file path
   - When the caller immediately reads the returned path, the file may contain 
0 bytes because data is still in Python's write buffer
   - Added `file.flush()` after `download_fileobj()` to ensure buffered content 
is written to disk
   
   ## Details
   
   The original implementation used a `with` context manager which auto-closes 
(and flushes) the file. When `preserve_file_name` support was added, the `with` 
was removed and the file is now left open and unflushed.
   
   This particularly affects small files (< ~8KB) that fit entirely in the 
buffer. The bug is latent in all environments but was exposed by 
`apache-airflow-providers-common-compat==1.13.1` (PR #61157), which changed the 
execution timing of `get_hook_lineage_collector()` between `download_fileobj()` 
and `return file.name`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to