alamb commented on code in PR #6566:
URL: https://github.com/apache/arrow-datafusion/pull/6566#discussion_r1221355361


##########
datafusion/core/src/execution/context.rs:
##########
@@ -518,7 +518,7 @@ impl SessionContext {
                 let physical = DataFrame::new(self.state(), input);
 
                 let batches: Vec<_> = physical.collect_partitioned().await?;
-                let table = Arc::new(MemTable::try_new(schema, batches)?);
+                let table = Arc::new(MemTable::new_not_registered(schema, 
batches));

Review Comment:
   I think using the same names in physical and logical plans is preferable 
because the rest of the parts of the code expects this and sometimes makes 
assumptions that it is the case (because it mostly is). 
   
   If we don't make the logical and physical plans match up, I predict we will 
continue to hit a long tail of bugs related to schema mismatches, only when 
using window functions related to the discrepancy.
   
   If the long display name is a problem (and I can see how it would be) 
perhaps we can figure out how to make `display_name` produce something shorter 
for window functions other than serializing the entire window definition
   
   Here is what postgres does:
   
   ```sql
   postgres=# select first_value(x) over (order by x) from foo;
    first_value
   -------------
              1
   (1 row)
   ```
   
   We probably need to do something more sophisticated as DataFusion needs 
distinct column names. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to