akbarnotopb opened a new pull request, #26669:
URL: https://github.com/apache/airflow/pull/26669

   **Problems/Issue** 
   - The uploaded parquet files in GCS is too big, e.g., ~2000 rows of 77 cols 
in `csv` ~1MB but in `parquet` will be ~50MB. 
   
   **Suspect** 
   - At `line 246-248`, current operator will create a `parquet table` for each 
row and immediately write it on the file buffer.
   
   **Solution** 
   1. Store each rows  in a temporary variable `parquet_datas`
   2. Write it down to the file if :  
      - its size exceed `approx_max_file_size_bytes` or,
      -  at the end of the row


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to