Takuya Ueshin created SPARK-31441:
-------------------------------------

             Summary: Support duplicated column names for toPandas with arrow 
execution.
                 Key: SPARK-31441
                 URL: https://issues.apache.org/jira/browse/SPARK-31441
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.5, 3.0.0
            Reporter: Takuya Ueshin


When we execute {{toPandas()}} with Arrow execution, it fails if the column 
names have duplicates.

{code:python}
>>> spark.sql("select 1 v, 1 v").toPandas()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 
2132, in toPandas
    pdf = table.to_pandas()
  File "pyarrow/array.pxi", line 441, in 
pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/table.pxi", line 1367, in pyarrow.lib.Table._to_pandas
  File 
"/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.7/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 653, in table_to_blockmanager
    columns = _deserialize_column_index(table, all_columns, column_indexes)
  File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 
704, in _deserialize_column_index
    columns = _flatten_single_level_multiindex(columns)
  File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 
937, in _flatten_single_level_multiindex
    raise ValueError('Found non-unique column index')
ValueError: Found non-unique column index
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to