kosiew commented on code in PR #19678:
URL: https://github.com/apache/datafusion/pull/19678#discussion_r2681365222


##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -411,9 +412,13 @@ fn pushdown_columns(
     let allow_list_columns = supports_list_predicates(expr);
     let mut checker = PushdownChecker::new(file_schema, allow_list_columns);
     expr.visit(&mut checker)?;
-    Ok((!checker.prevents_pushdown()).then_some(PushdownColumns {
-        required_columns: checker.required_columns,
-        nested: checker.nested_behavior,
+    let prevents_pushdown = checker.prevents_pushdown();
+    let nested = checker.nested_behavior;
+    let mut required_columns = checker.required_columns;
+    required_columns.sort_unstable();

Review Comment:
   how about adding:
   ```rust
   impl PushdownChecker {
       fn into_sorted_columns(mut self) -> PushdownColumns {
           self.required_columns.sort_unstable();
           self.required_columns.dedup(); // this removes the need for contains 
check
           PushdownColumns {
               required_columns: self.required_columns,
               nested: self.nested_behavior,
           }
       }
   ```



##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -390,9 +392,8 @@ enum NestedColumnSupport {
     Unsupported,
 }
 
-#[derive(Debug)]

Review Comment:
   Is there a reason to remove 
   #[derive(Debug)]
   ?



##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -307,7 +307,9 @@ impl<'schema> PushdownChecker<'schema> {
             }
         };
 
-        self.required_columns.insert(idx);
+        if !self.required_columns.contains(&idx) {
+            self.required_columns.push(idx);
+        }

Review Comment:
   I think it would help future contributors to
   - Add comment explaining linear search is acceptable for small `n`
      - OR switch to `HashSet` for O(1) deduplication if `n` might grow



##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -307,7 +307,9 @@ impl<'schema> PushdownChecker<'schema> {
             }
         };
 
-        self.required_columns.insert(idx);
+        if !self.required_columns.contains(&idx) {
+            self.required_columns.push(idx);
+        }

Review Comment:
   see suggested
   
   fn into_sorted_columns 
   below



##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -411,9 +412,13 @@ fn pushdown_columns(
     let allow_list_columns = supports_list_predicates(expr);
     let mut checker = PushdownChecker::new(file_schema, allow_list_columns);
     expr.visit(&mut checker)?;
-    Ok((!checker.prevents_pushdown()).then_some(PushdownColumns {
-        required_columns: checker.required_columns,
-        nested: checker.nested_behavior,
+    let prevents_pushdown = checker.prevents_pushdown();
+    let nested = checker.nested_behavior;
+    let mut required_columns = checker.required_columns;
+    required_columns.sort_unstable();
+    Ok((!prevents_pushdown).then_some(PushdownColumns {
+        required_columns,
+        nested,
     }))

Review Comment:
   and then rewriting this to:
   
   ```rust
   Ok((!checker.prevents_pushdown())
       .then_some(checker.into_sorted_columns()))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to