aditya-emmanuel opened a new issue, #50159: URL: https://github.com/apache/airflow/issues/50159
### Apache Airflow Provider(s) amazon ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==8.28.0 ### Apache Airflow version 2.10.3 ### Operating System macOS ### Deployment Astronomer ### Deployment details _No response_ ### What happened While using the SqlToS3Operator, I've noticed that the memory usage on the worker often exceeds expectations for large datasets. For instance, a SQL query that results in a ~5 GB .txt export, the task consumes approximately 10 GB of memory during execution. I believe this is because the operator first loads the entire query result into a Pandas DataFrame (in memory), writes it to a local temporary file, and then uploads that file to S3. ### Temporary File Name Leakage in .gz Files Additionally, when exporting to .gz files (e.g., example.txt.gz) and opening/unzipping the .gz file in Windows using WindowsExplorer/WinRar/7zip, the content reveals the temporary filename that was generated internally using Python’s NamedTemporaryFile() Note:- This issue occurs only when opening/unzipping the .gz file on Windows; it does not appear on macOS or Linux.  ### What you think should happen instead - Support direct upload to S3 via file-like buffers to avoid writing to disk - Control the internal filename for compressed outputs to avoid leaking temporary names ### How to reproduce DAG Snippet ``` extract_operator = SqlToS3Operator( retries=3, retry_delay=timedelta(minutes=1), retry_exponential_backoff=True, task_id='extract_example_data', sql_conn_id="redshift_conn", query="SELECT * from example;", s3_bucket="{{ var.value.S3_BUCKET }}", s3_key="example.txt.gz", aws_conn_id="aws_datalake_conn", file_format="csv", replace=True, pd_kwargs={ "sep": "|", "index": False, "quoting": csv.QUOTE_MINIMAL, "compression": "gzip" }, ) ``` ### Anything else Code reference https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/transfers/sql_to_s3.py#L195-L203 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
