fullstart commented on issue #14147:
URL: https://github.com/apache/datafusion/issues/14147#issuecomment-2595296772

   You're right, it works with CSV-created dataframes, I missed header stuff
   
   But originally issue arised with parquet files. I tested with parquet this 
time, and problem is there:
   ```
      2: from datafusion import SessionContext, lit, col, functions as f
      3: ctx = SessionContext()
      4: x1 = ctx.from_pydict({"id1": [1, 2, 3], "val1": ["a", "b", "c"]})
      5: x2 = ctx.from_pydict({"id1": [2, 3, 4], "val1": ["b", "c", "d"]})
      6: x1.write_parquet("df1.parquet")
      7: x2.write_parquet("df2.parquet")
      8: xf1 = ctx.read_parquet("df1.parquet")
      9: xf2 = ctx.read_parquet("df2.parquet")
     10: xf1
   Out[10]:
   DataFrame()
   +-----+------+
   | id1 | val1 |
   +-----+------+
   | 1   | a    |
   | 2   | b    |
   | 3   | c    |
   +-----+------+
     11: xf2
   Out[11]:
   DataFrame()
   +-----+------+
   | id1 | val1 |
   +-----+------+
   | 2   | b    |
   | 3   | c    |
   | 4   | d    |
   +-----+------+
     12: x1.join(x2, on="id1")
   Out[12]:
   DataFrame()
   +-----+------+-----+------+
   | id1 | val1 | id1 | val1 |
   +-----+------+-----+------+
   | 2   | b    | 2   | b    |
   | 3   | c    | 3   | c    |
   +-----+------+-----+------+
     13: xf1.join(xf2, on="id1")
   ---------------------------------------------------------------------------
   Exception                                 Traceback (most recent call last)
   Cell In[13], line 1
   ----> 1 xf1.join(xf2, on="id1")
   
   File ~\prj\datafusion\venv\Lib\site-packages\datafusion\dataframe.py:468, in 
DataFrame.join(self, right, on, how, left_on, right_on, join_keys)
       465 if isinstance(right_on, str):
       466     right_on = [right_on]
   --> 468 return DataFrame(self.df.join(right.df, how, left_on, right_on))
   
   Exception: Schema error: Schema contains duplicate qualified field name 
"?table?".id1
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to