[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1716: Add explicit column mask construction (#1701)

GitBox Sat, 21 May 2022 03:29:11 -0700


tustvold commented on code in PR #1716:
URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878679531



##########
parquet/src/arrow/async_reader.rs:
##########
@@ -166,32 +167,17 @@ impl<T: AsyncRead + AsyncSeek + Unpin> 
ParquetRecordBatchStreamBuilder<T> {
     }
 
     /// Only read data from the provided column indexes
-    pub fn with_projection(self, projection: Vec<usize>) -> Self {
+    pub fn with_projection(self, mask: ProjectionMask) -> Self {
         Self {
-            projection: Some(projection),
+            projection: mask,
             ..self
         }
     }
 
     /// Build a new [`ParquetRecordBatchStream`]
     pub fn build(self) -> Result<ParquetRecordBatchStream<T>> {
-        let num_columns = self.schema.fields().len();
         let num_row_groups = self.metadata.row_groups().len();
 
-        let columns = match self.projection {
-            Some(projection) => {
-                if let Some(col) = projection.iter().find(|x| **x >= 
num_columns) {
-                    return Err(general_err!(
-                        "column projection {} outside bounds of schema 0..{}",

Review Comment:
   This check was actually incorrect as it was checking against the arrow 
schema not the parquet schema. I think this demonstrates the footgun prone 
nature of the old API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1716: Add explicit column mask construction (#1701)

Reply via email to