etseidl commented on code in PR #7307: URL: https://github.com/apache/arrow-rs/pull/7307#discussion_r2015070079
########## parquet/src/arrow/arrow_reader/mod.rs: ########## @@ -239,6 +242,16 @@ impl<T> ArrowReaderBuilder<T> { ..self } } + + /// Include file row numbers in the output with the given column name + /// + /// This will add a column to the output record batch with the file row number + pub fn with_row_number_column(self, row_number_column: Option<String>) -> Self { + Self { + row_number_column, Review Comment: ```suggestion pub fn with_row_number_column(self, row_number_column: String) -> Self { Self { row_number_column: Some(row_number_column), ``` Since `row_number_column` defaults to `None` anyway. ########## parquet/src/errors.rs: ########## @@ -52,6 +52,9 @@ pub enum ParquetError { /// Returned when a function needs more data to complete properly. The `usize` field indicates /// the total number of bytes required, not the number of additional bytes. NeedMoreData(usize), + /// Returned when an operation needs to know the first row number of a row group, but the row + /// number is unknown. + RowGroupMetaDataMissingRowNumber, Review Comment: I'm not convinced we need a new error type here. I think you can just use `General` for this purpose. ########## parquet/src/arrow/array_reader/builder.rs: ########## @@ -356,14 +392,23 @@ mod tests { ) .unwrap(); - let array_reader = build_array_reader(fields.as_ref(), &mask, &file_reader).unwrap(); + let array_reader = build_array_reader( Review Comment: I think there should be two tests. `test_create_array_reader` should just pass `None` for the row number column and remain otherwise unchanged, and then a second `test_create_array_reader_with_row_numbers` which would be what you have here. ########## parquet/src/arrow/array_reader/builder.rs: ########## @@ -39,9 +39,29 @@ pub fn build_array_reader( field: Option<&ParquetField>, mask: &ProjectionMask, row_groups: &dyn RowGroups, + row_number_column: Option<String>, Review Comment: I wonder if it would be better to make this `Option<&str>` and then use `.as_deref()` rather than cloning where this (and the other `build_X` functions) are called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org