[GitHub] [arrow-datafusion] Jefffrey commented on issue #5211: Dataframe methods don't do ambiguity checks

via GitHub Sat, 11 Feb 2023 18:09:33 -0800


Jefffrey commented on issue #5211:
URL: 
https://github.com/apache/arrow-datafusion/issues/5211#issuecomment-1426922495


   Although could fix by adding the relevant ambiguity check in the Dataframe 
API methods (before it calls the LogicalPlanBuilder), I wonder if it would be 
better to try push that ambiguity check into the LogicalPlanBuilder methods 
themselves, otherwise always there would be duplication of the ambiguity check, 
once in the SQL planner and once in the Dataframe API, though they both use the 
same underlying API which is LogicalPlanBuilder.
   
   Part of this could be to try introduce the check into the `normalize_col` 
function itself (actually in the underlying function it calls):
   
   
https://github.com/apache/arrow-datafusion/blob/f75d25fec2c1a5581eeb8ce73a890e5792df02c7/datafusion/expr/src/expr_rewriter.rs#L344-L348
   
   Which calls underlying `normalize_with_schemas`:
   
   
https://github.com/apache/arrow-datafusion/blob/f75d25fec2c1a5581eeb8ce73a890e5792df02c7/datafusion/common/src/column.rs#L86-L109
   
   One thing thats tripping me is the usage of `all_schemas()` in 
`normalize_col`, since that is what allows `normalize_with_schemas` to find a 
schema with only one of the field, since usually:
   
   - Check output schema first, find 2 fields because of ambiguity
   - Don't return (since not a using column), check next schema in iteration 
(e.g. left input)
   - That schema has only one of the ambiguous fields, hence will return that 
for the normalized column
   
   Could instead enforce in the first iteration that if ambiguity is detected, 
will fail there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Jefffrey commented on issue #5211: Dataframe methods don't do ambiguity checks

Reply via email to