selimchergui opened a new pull request, #37055:
URL: https://github.com/apache/airflow/pull/37055

   **Description**
   `SqlToS3Operator` create an S3 object from the output of sql qurey. Unless 
you specify a groupby_kwargs parameter, the entire data will be written in one 
object. This could results on data pipeline issues specially if output files 
are consumed by limited resources workers (like pods).
   So I suggest adding a new paramter, max_rows_per_file, to limit the size of 
destination files and dispatch output data into multiple files.
   
   **Use case/motivation**
   I faced a lot of out-of-memory issues when trying to process SqlToS3Operator 
destination object. With such a feature, output will be standardized and new 
compute worker can have more suited size.
   
   Related issues
   No response
   
   **Are you willing to submit a PR?**
    Yes I am willing to submit a PR!
   
   **Code of Conduct**
    I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to