tustvold commented on code in PR #1716:
URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878679531
##########
parquet/src/arrow/async_reader.rs:
##########
@@ -166,32 +167,17 @@ impl<T: AsyncRead + AsyncSeek + Unpin>
ParquetRecordBatchStreamBuilder<T> {
}
/// Only read data from the provided column indexes
- pub fn with_projection(self, projection: Vec<usize>) -> Self {
+ pub fn with_projection(self, mask: ProjectionMask) -> Self {
Self {
- projection: Some(projection),
+ projection: mask,
..self
}
}
/// Build a new [`ParquetRecordBatchStream`]
pub fn build(self) -> Result<ParquetRecordBatchStream<T>> {
- let num_columns = self.schema.fields().len();
let num_row_groups = self.metadata.row_groups().len();
- let columns = match self.projection {
- Some(projection) => {
- if let Some(col) = projection.iter().find(|x| **x >=
num_columns) {
- return Err(general_err!(
- "column projection {} outside bounds of schema 0..{}",
Review Comment:
This check was actually incorrect as it was checking against the arrow
schema not the parquet schema. I think this demonstrates the footgun prone
nature of the old API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]