eladkal commented on code in PR #36817:
URL: https://github.com/apache/airflow/pull/36817#discussion_r1457350670
##########
airflow/providers/google/CHANGELOG.rst:
##########
@@ -27,6 +27,16 @@
Changelog
---------
+.. note::
+ The default value of ``parquet_row_group_size`` in ``BaseSQLToGCSOperator``
has changed from 1 to
+ 100000, in order to have a default that provides better compression
efficiency and performance of
+ reading the data in the output Parquet files. In many cases, the previous
value of 1 resulted in
+ very large files, long task durations and out of memory issues. A default
value of 100000 may require
+ more memory to execute the operator, in which case users can override the
``parquet_row_group_size``
+ parameter in the operator. All operators that are derived from
``BaseSQLToGCSOperator`` are affected
+ when ``export_format`` is ``parquet``: ``MySQLToGCSOperator``,
``PrestoToGCSOperator``,
+ ``OracleToGCSOperator``, ``TrinoToGCSOperator``, ``MSSQLToGCSOperator`` and
``PostgresToGCSOperator``.
Review Comment:
This is good explnation I would even highlight that we consider this change
as bug fix. Which means that users understand that we weighed in all factors
and made an inform decision. It's much better than users might think that we
overlooked it.
```suggestion
The default value of ``parquet_row_group_size`` in
``BaseSQLToGCSOperator`` has changed from 1 to
100000, in order to have a default that provides better compression
efficiency and performance of
reading the data in the output Parquet files. In many cases, the previous
value of 1 resulted in
very large files, long task durations and out of memory issues. A default
value of 100000 may require
more memory to execute the operator, in which case users can override the
``parquet_row_group_size``
parameter in the operator. All operators that are derived from
``BaseSQLToGCSOperator`` are affected
when ``export_format`` is ``parquet``: ``MySQLToGCSOperator``,
``PrestoToGCSOperator``,
``OracleToGCSOperator``, ``TrinoToGCSOperator``, ``MSSQLToGCSOperator``
and ``PostgresToGCSOperator``. Due to the above we treat this change as bug fix.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]