tachyonwill commented on a change in pull request #12630:
URL: https://github.com/apache/arrow/pull/12630#discussion_r827248653
##########
File path: python/pyarrow/parquet.py
##########
@@ -605,6 +605,9 @@ def _sanitize_table(table, new_schema, flavor):
If None, no encryption will be done.
The encryption properties can be created using:
``CryptoFactory.file_encryption_properties()``.
+write_batch_size : int, default None
+ Number of values to write to a page at a time. If None, use the default of
+ 1024.
Review comment:
The default is 1024:
https://github.com/apache/arrow/blob/3bf061783f4e1ab447d2eb0f487c0c4fce6d5b15/cpp/src/parquet/properties.h#L96
The way `data_page_size` and `write_batch_size` work is `write_batch_size`
values are written, then the size of the page is checked against
`data_page_size` and if `data_page_size` is exceeded, we start a new page.
Normally, 1024 is fine for the `write_batch_size` but if the values are really
big(big strings) or `data_page_size` really small, then a smaller
`write_batch_size` is needed to keep the page sizes close to `data_page_size` .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]