yonipeleg33 commented on code in PR #9357:
URL: https://github.com/apache/arrow-rs/pull/9357#discussion_r2779221476


##########
parquet/src/file/properties.rs:
##########
@@ -468,7 +487,8 @@ impl Default for WriterPropertiesBuilder {
             data_page_size_limit: DEFAULT_PAGE_SIZE,
             data_page_row_count_limit: DEFAULT_DATA_PAGE_ROW_COUNT_LIMIT,
             write_batch_size: DEFAULT_WRITE_BATCH_SIZE,
-            max_row_group_size: DEFAULT_MAX_ROW_GROUP_SIZE,
+            max_row_group_row_count: Some(DEFAULT_MAX_ROW_GROUP_SIZE),

Review Comment:
   Done



##########
parquet/src/file/properties.rs:
##########
@@ -247,11 +251,25 @@ impl WriterProperties {
         self.write_batch_size
     }
 
-    /// Returns maximum number of rows in a row group.
+    /// Returns maximum number of rows in a row group, or `usize::MAX` if 
unlimited.
     ///
     /// For more details see 
[`WriterPropertiesBuilder::set_max_row_group_size`]
     pub fn max_row_group_size(&self) -> usize {

Review Comment:
   That makes sense, as we also deprecate the corresponding setter.  Done!



##########
parquet/src/file/properties.rs:
##########
@@ -45,6 +45,9 @@ pub const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = 
EnabledStatistics::Pag
 pub const DEFAULT_WRITE_PAGE_HEADER_STATISTICS: bool = false;
 /// Default value for [`WriterProperties::max_row_group_size`]
 pub const DEFAULT_MAX_ROW_GROUP_SIZE: usize = 1024 * 1024;
+/// Default value for [`WriterProperties::max_row_group_bytes`] (128 MB, same 
as the official Java
+/// implementation for `parquet.block.size`)
+pub const DEFAULT_MAX_ROW_GROUP_BYTES: usize = 128 * 1024 * 1024;

Review Comment:
   > Or should we set it?
   
   @alamb I think not, as it changes behaviour without users opting-in for that 
new behaviour.  `None` preserves the existing behaviour by default, which is no 
byte count limit at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to