[
https://issues.apache.org/jira/browse/SPARK-55059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55059:
-----------------------------------
Labels: pull-request-available (was: )
> Remove empty table workaround in toPandas
> -----------------------------------------
>
> Key: SPARK-55059
> URL: https://issues.apache.org/jira/browse/SPARK-55059
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Yicong Huang
> Priority: Major
> Labels: pull-request-available
>
> SPARK-51112 added a workaround in \{{_convert_arrow_table_to_pandas()}} to
> avoid segfault when converting empty tables with nested array columns:
> {code:python}
> # SPARK-51112: If the table is empty, we avoid using pyarrow to_pandas to
> create the
> # DataFrame, as it may fail with a segmentation fault.
> if arrow_table.num_rows == 0:
> column_data = (
> pd.Series([], name=temp_col_names[i], dtype="object") for i in
> range(len(schema.fields))
> )
> {code}
> This workaround is no longer necessary after SPARK-55056, which fixed the
> root cause in \{{ArrayWriter.finish()}} by properly initializing the Arrow
> ListArray offset buffer when \{{count == 0}}.
> Proposal: Remove the SPARK-51112 workaround and let pyarrow handle empty
> tables directly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]