[GitHub] [arrow] alamb edited a comment on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

GitBox Mon, 07 Dec 2020 05:58:48 -0800


alamb edited a comment on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739934509



   > As you can see, the data_type and nullable use the schema from the plan 
whereas the evaluate method uses the schema from the record batch, which is a 
little inconsistent. They should probably all use the same schema.
   
   I agree -- I recommend using the schema from the plan for consistency.
   
   > This IMO leaves us with 2., which is what I would try: change the physical 
planner to alias/rewrite column names with the qualifier when the physical plan 
is created. This will cause the resulting RecordBatch's schema to have columns 
named t1.a and t2.a, thereby guaranteeing the invariant that the output schema 
of the physical execution matches the schema of the logical plan.
   
   
   I agree with @jorgecarleitao 's recommendation -- I would recommend when 
moving from logical --> physical plan, that we always use the fully qualified 
name of the field, which would avoid ambiguity. If we don't like `t1.foo` being 
sprinkled around in plans that only have one table or where the column names 
aren't ambiguous, we could implement a (logical plan) optimizer pass to remove 
unneeded qualifiers. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb edited a comment on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

Reply via email to