[GitHub] [arrow] alamb commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

GitBox Mon, 07 Dec 2020 05:58:48 -0800


alamb commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739934509



   > As you can see, the data_type and nullable use the schema from the plan 
whereas the evaluate method uses the schema from the record batch, which is a 
little inconsistent. They should probably all use the same schema.
   
   I agree -- I recommend using the schema from the plan for consistency.
   
   > This IMO leaves us with 2., which is what I would try: change the physical 
planner to alias/rewrite column names with the qualifier when the physical plan 
is created. This will cause the resulting RecordBatch's schema to have columns 
named t1.a and t2.a, thereby guaranteeing the invariant that the output schema 
of the physical execution matches the schema of the logical plan.
   
   
   I agree with this recommendation -- I would recommend when moving from 
logical --> physical plan, that we always use the fully qualified name of the 
field, which would avoid ambiguity. If we don't like `t1.foo` being sprinkled 
around in plans that only have one table or where the column names aren't 
ambiguous, we could implement a (logical plan) optimizer pass to remove 
unneeded qualifiers. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

Reply via email to