[GitHub] [arrow-rs] alamb commented on a change in pull request #1110: Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053)

GitBox Mon, 10 Jan 2022 14:02:49 -0800


alamb commented on a change in pull request #1110:
URL: https://github.com/apache/arrow-rs/pull/1110#discussion_r781597286




##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -440,14 +477,16 @@ mod tests {
         /// Number of row group to write to parquet (row group size =
         /// num_row_groups / num_rows)
         num_row_groups: usize,
-        /// Total number of rows
+        /// Total number of rows per row group
         num_rows: usize,
         /// Size of batches to read back
         record_batch_size: usize,
-        /// Total number of batches to attempt to read.
-        /// `record_batch_size` * `num_iterations` should be greater
-        /// than `num_rows` to ensure the data can be read back completely
-        num_iterations: usize,

Review comment:
       I agree that it is redundant when `record_batch_size` is provided (which 
means the data is not all read in one big chunk, but is read in 
`record_batch_size` chunks)

##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -440,14 +477,16 @@ mod tests {
         /// Number of row group to write to parquet (row group size =
         /// num_row_groups / num_rows)
         num_row_groups: usize,
-        /// Total number of rows
+        /// Total number of rows per row group
         num_rows: usize,
         /// Size of batches to read back
         record_batch_size: usize,
-        /// Total number of batches to attempt to read.
-        /// `record_batch_size` * `num_iterations` should be greater
-        /// than `num_rows` to ensure the data can be read back completely
-        num_iterations: usize,

Review comment:
       I agree that it is redundant when `record_batch_size` is provided (which 
means the data is not all read in one big chunk, but is read in 
`record_batch_size` chunks)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alamb commented on a change in pull request #1110: Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053)

Reply via email to