jychen7 commented on issue #1507: URL: https://github.com/apache/arrow-datafusion/issues/1507#issuecomment-1017052134
Actually the problem is neither `datafusion` or `pydatafusion`, just different expection. `datafusion` allow duplicate columns in `ans`, but `pydatafusion` will raise error when `create_dataframe` when input columns are duplicated. e.g. `select x.c2, y.c2 from x join y using (c1) limit 1` shows ``` +----+----+ | c2 | c2 | +----+----+ | 1 | 1 | +----+----+ ``` 1. I test in MySQL 5.6 and PostgreSQL 9.6, they also show duplicate column names in output, e.g. http://sqlfiddle.com/#!9/a6c585/237251 and http://sqlfiddle.com/#!17/bf2fd/25993. This also align with "All bare column field names MUST not contain relation/table qualifier." in https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html 2. dataframe is correct too, since it is not expected to have duplicate input name. I believe the internal dataframe using in `ctx.sql` have different column name like `x.c2` and `y.c2`, but we simplify that when output -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
