[
https://issues.apache.org/jira/browse/SPARK-31441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon reassigned SPARK-31441:
------------------------------------
Assignee: Takuya Ueshin
> Support duplicated column names for toPandas with Arrow execution.
> ------------------------------------------------------------------
>
> Key: SPARK-31441
> URL: https://issues.apache.org/jira/browse/SPARK-31441
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.5, 3.0.0
> Reporter: Takuya Ueshin
> Assignee: Takuya Ueshin
> Priority: Major
>
> When we execute {{toPandas()}} with Arrow execution, it fails if the column
> names have duplicates.
> {code:python}
> >>> spark.sql("select 1 v, 1 v").toPandas()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/path/to/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line
> 2132, in toPandas
> pdf = table.to_pandas()
> File "pyarrow/array.pxi", line 441, in
> pyarrow.lib._PandasConvertible.to_pandas
> File "pyarrow/table.pxi", line 1367, in pyarrow.lib.Table._to_pandas
> File
> "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.7/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
> line 653, in table_to_blockmanager
> columns = _deserialize_column_index(table, all_columns, column_indexes)
> File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line
> 704, in _deserialize_column_index
> columns = _flatten_single_level_multiindex(columns)
> File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line
> 937, in _flatten_single_level_multiindex
> raise ValueError('Found non-unique column index')
> ValueError: Found non-unique column index
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]