[
https://issues.apache.org/jira/browse/SPARK-55674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yicong Huang updated SPARK-55674:
---------------------------------
Description:
Follow-up to SPARK-55600 (PR #54382).
Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be
applied to similar places:
* {{python/pyspark/sql/connect/session.py}} around line 626, which uses
{{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
* -{{python/pyspark/sql/conversion.py}} around line 289, which uses
{{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}-
Here schema may have more fields but data frame is empty. So we need to create
it out instead of converting it. So this case does not apply.
These should use the same approach introduced in SPARK-55600 for consistency.
was:
Follow-up to SPARK-55600 (PR #54382).
Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be
applied to similar places:
* {{python/pyspark/sql/connect/session.py}} around line 626, which uses
{{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
-* {{python/pyspark/sql/conversion.py}} around line 289, which uses
{{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}-
Here schema may have more fields but data frame is empty. So we need to create
it out instead of converting it. So this case does not apply.
These should use the same approach introduced in SPARK-55600 for consistency.
> Apply consistent 0-column pandas-to-Arrow fix
> ---------------------------------------------
>
> Key: SPARK-55674
> URL: https://issues.apache.org/jira/browse/SPARK-55674
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Yicong Huang
> Priority: Major
>
> Follow-up to SPARK-55600 (PR #54382).
> Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be
> applied to similar places:
> * {{python/pyspark/sql/connect/session.py}} around line 626, which uses
> {{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
> * -{{python/pyspark/sql/conversion.py}} around line 289, which uses
> {{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}-
>
> Here schema may have more fields but data frame is empty. So we need to
> create it out instead of converting it. So this case does not apply.
> These should use the same approach introduced in SPARK-55600 for consistency.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]