jonahgao commented on code in PR #8840:
URL: https://github.com/apache/arrow-datafusion/pull/8840#discussion_r1467164928


##########
datafusion/core/src/physical_optimizer/projection_pushdown.rs:
##########
@@ -832,16 +834,22 @@ fn all_alias_free_columns(exprs: &[(Arc<dyn 
PhysicalExpr>, String)]) -> bool {
 fn new_projections_for_columns(
     projection: &ProjectionExec,
     source: &Option<Vec<usize>>,
-) -> Vec<usize> {
-    projection
-        .expr()
-        .iter()
-        .filter_map(|(expr, _)| {
-            expr.as_any()
-                .downcast_ref::<Column>()
-                .and_then(|expr| source.as_ref().map(|proj| 
proj[expr.index()]))
-        })
-        .collect()
+) -> Option<Vec<usize>> {
+    if source.is_none() {

Review Comment:
   I'm afraid that this bug might not have been fixed.
   Returning `None` seems to select all the columns of the csv table `balance`.
   ```sh
   DataFusion CLI v34.0.0
   ❯ CREATE EXTERNAL TABLE balance STORED as CSV WITH HEADER ROW LOCATION 
'../testing/data/csv/r_cte_balance.csv';
   0 rows in set. Query took 0.026 seconds.
   
   ❯ set datafusion.optimizer.max_passes=0;
   0 rows in set. Query took 0.002 seconds.
   
   ❯ select time from balance;
   ProjectionPushdown
   caused by
   Internal error: PhysicalOptimizer rule 'ProjectionPushdown' failed, due to 
generate a different schema, 
   
   original schema: Schema { fields: [
       Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }], metadata: {} }, 
   
   new schema: Schema { fields: 
       Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, 
       Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, 
       Field { name: "account_balance", data_type: Int64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }.
   
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to