houqp edited a comment on pull request #952: URL: https://github.com/apache/arrow-datafusion/pull/952#issuecomment-908069027
Our current output field name semantics mostly aligns with what spark has, which strips column qualifiers in all cases. This PR changes the semantics to handle compound and bare column names differently. For bare names, we strip column qualifiers, but not for compound columns. Unlike Spark, relational databases like MySQL, Posgresql and Sqlite all treat compound and bare column names differently. MySQL, Postgresql and Sqlite strip qualifers for bare column names like we do. But MySQL and Sqlite use raw user query input for compound column names. Postgresql on the other hand just uses `?column?` for all compound column names. In all compute engines, users are not expected to reference compound columns by generated names because these names are not guaranteed to be valid sql expressions. Instead, they should always manually alias them. As a result, the output filed name for compound columns are just there for display/debug purpose only. See also @Dandandan 's comment at https://github.com/apache/arrow-datafusion/pull/280#issuecomment-834805975. @jorgecarleitao as for the counter-intuitive example you mentioned, the current implementation will output field name `SUM(id)` column name for query `SELECT SUM(t1.id)`, while the proposed new behavior will output `SUM(t1.id)` field name for query `SELECT SUM(id)`. So both of them will not use the exact user query input as the output field name. Either way, it should have no impact to how users construct queries. The proposed new behavior provides better UX compared to Postgresql's `?column?` column name. I think it's also an improvement over the current (spark's) behavior because it will produce an unambiguous column name for queries like `SELECT t1.id * t2.id FROM t1 JOIN t2 USING id`. The current behavior will output `id * id`, which is not as good as `t1.id * t2.id` IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
