alamb commented on a change in pull request #1110:
URL: https://github.com/apache/arrow-rs/pull/1110#discussion_r781597286
##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -440,14 +477,16 @@ mod tests {
/// Number of row group to write to parquet (row group size =
/// num_row_groups / num_rows)
num_row_groups: usize,
- /// Total number of rows
+ /// Total number of rows per row group
num_rows: usize,
/// Size of batches to read back
record_batch_size: usize,
- /// Total number of batches to attempt to read.
- /// `record_batch_size` * `num_iterations` should be greater
- /// than `num_rows` to ensure the data can be read back completely
- num_iterations: usize,
Review comment:
I agree that it is redundant when `record_batch_size` is provided (which
means the data is not all read in one big chunk, but is read in
`record_batch_size` chunks)
##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -440,14 +477,16 @@ mod tests {
/// Number of row group to write to parquet (row group size =
/// num_row_groups / num_rows)
num_row_groups: usize,
- /// Total number of rows
+ /// Total number of rows per row group
num_rows: usize,
/// Size of batches to read back
record_batch_size: usize,
- /// Total number of batches to attempt to read.
- /// `record_batch_size` * `num_iterations` should be greater
- /// than `num_rows` to ensure the data can be read back completely
- num_iterations: usize,
Review comment:
I agree that it is redundant when `record_batch_size` is provided (which
means the data is not all read in one big chunk, but is read in
`record_batch_size` chunks)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]