[GitHub] [arrow-datafusion] jychen7 commented on issue #1507: Python bindings create duplicated qualified fields after joining

GitBox Wed, 19 Jan 2022 18:16:27 -0800


jychen7 commented on issue #1507:
URL: 
https://github.com/apache/arrow-datafusion/issues/1507#issuecomment-1017052134



   Actually the problem is neither `datafusion` or `pydatafusion`, just 
different expection.
   
   `datafusion` allow duplicate columns in `ans`, but `pydatafusion` will raise 
error when `create_dataframe` when input columns are duplicated.
   e.g. `select x.c2, y.c2 from x join y using (c1) limit 1` shows
   ```
   +----+----+
   | c2 | c2 |
   +----+----+
   | 1  | 1  |
   +----+----+
   ```
   
   1. I test in MySQL 5.6 and PostgreSQL 9.6, they also show duplicate column 
names in output, e.g. http://sqlfiddle.com/#!9/a6c585/237251 and 
http://sqlfiddle.com/#!17/bf2fd/25993. This also align with "All bare column 
field names MUST not contain relation/table qualifier." in 
https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html
   2. dataframe is correct too, since it is not expected to have duplicate 
input name. I believe the internal dataframe using in `ctx.sql` have different 
column name like `x.c2` and `y.c2`, but we simplify that when output


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jychen7 commented on issue #1507: Python bindings create duplicated qualified fields after joining

Reply via email to