goldmedal commented on PR #11035: URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2293855497
> Yes, I saw something like that in the code: using tmp_table as the default alias. But I'm not sure if it is the right way, because it might cause problems when resolving column names? @holicc After some experimentation, I found that it's not straightforward. I tried implementing a `TableProvider` with a custom `get_logical_plan` method to set an alias for the table by default. However, I found that the internal plan is invoked during the analysis phase, which is too late to modify column names since all projections have already been planned. The plan will look like this: ```sql > EXPLAIN SELECT sum(a) FROM '/Users/jax/git/datafusion/datafusion/core/tests/data/2.json' +---------------+-------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Aggregate: groupBy=[[]], aggr=[[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)]] | | | SubqueryAlias: /Users/jax/git/datafusion/datafusion/core/tests/data/2.json | | | TableScan: ?url? projection=[a] | | physical_plan | AggregateExec: mode=Final, gby=[], aggr=[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)] | | | CoalescePartitionsExec | | | AggregateExec: mode=Partial, gby=[], aggr=[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)] | | | RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1 | | | JsonExec: file_groups={1 group: [[Users/jax/git/datafusion/datafusion/core/tests/data/2.json]]}, projection=[a] | | | | +---------------+-------------------------------------------------------------------------------------------------------------------------+ ``` If we want to improve readability, we might need to create an `AnalyzerRule` for it. However, this is not easy due to the complexity of column resolution, as you mentioned. I think that we could address this issue in a separate pull request if needed. A simpler solution is to manually add an alias when querying: ```sql > EXPLAIN SELECT sum(a) FROM '/Users/jax/git/datafusion/datafusion/core/tests/data/2.json' as t +---------------+-------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Aggregate: groupBy=[[]], aggr=[[sum(t.a)]] | | | SubqueryAlias: t | | | TableScan: /Users/jax/git/datafusion/datafusion/core/tests/data/2.json projection=[a] | | physical_plan | AggregateExec: mode=Final, gby=[], aggr=[sum(t.a)] | | | CoalescePartitionsExec | | | AggregateExec: mode=Partial, gby=[], aggr=[sum(t.a)] | | | RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1 | | | JsonExec: file_groups={1 group: [[Users/jax/git/datafusion/datafusion/core/tests/data/2.json]]}, projection=[a] | | | | +---------------+-------------------------------------------------------------------------------------------------------------------------+ ``` This is a straightforward way to produce a more readable plan without complicating the code. cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org