[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #5268: Support page skipping / page_index pushdown for evolved schemas

via GitHub Mon, 13 Feb 2023 22:08:14 -0800


Ted-Jiang commented on code in PR #5268:
URL: https://github.com/apache/arrow-datafusion/pull/5268#discussion_r1105333170



##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -233,25 +233,18 @@ impl PruningPredicate {
             .unwrap_or_default()
     }
 
-    /// Returns all need column indexes to evaluate this pruning predicate
-    pub(crate) fn need_input_columns_ids(&self) -> HashSet<usize> {
-        let mut set = HashSet::new();
-        self.required_columns.columns.iter().for_each(|x| {
-            match self.schema().column_with_name(x.0.name.as_str()) {
-                None => {}
-                Some(y) => {
-                    set.insert(y.0);
-                }
-            }
-        });
-        set
+    pub(crate) fn required_columns(&self) -> &RequiredStatColumns {

Review Comment:
   Thanks for explanation! 👍 
   >  If an individual parquet file does not have all the columns or has the 
columns in a different order
   
   I have a question about if `file_a (c1, c2), file_b(c1, c3)`, do df support 
create external table t(c1) on both file_a and file_b 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #5268: Support page skipping / page_index pushdown for evolved schemas

Reply via email to