[jira] [Updated] (SPARK-55674) Apply consistent 0-column pandas-to-Arrow fix

Yicong Huang (Jira) Tue, 24 Feb 2026 17:50:09 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-55674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yicong Huang updated SPARK-55674:
---------------------------------
    Description: 
Follow-up to SPARK-55600 (PR #54382).

Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be 
applied to similar places:

* {{python/pyspark/sql/connect/session.py}} around line 626, which uses 
{{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
* -{{python/pyspark/sql/conversion.py}} around line 289, which uses 
{{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}-  
Here schema may have more fields but data frame is empty. So we need to create 
it out instead of converting it. So this case does not apply.

These should use the same approach introduced in SPARK-55600 for consistency.

  was:
Follow-up to SPARK-55600 (PR #54382).

Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be 
applied to similar places:

* {{python/pyspark/sql/connect/session.py}} around line 626, which uses 
{{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
-* {{python/pyspark/sql/conversion.py}} around line 289, which uses 
{{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}- 
Here schema may have more fields but data frame is empty. So we need to create 
it out instead of converting it. So this case does not apply.

These should use the same approach introduced in SPARK-55600 for consistency.


> Apply consistent 0-column pandas-to-Arrow fix
> ---------------------------------------------
>
>                 Key: SPARK-55674
>                 URL: https://issues.apache.org/jira/browse/SPARK-55674
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Yicong Huang
>            Priority: Major
>
> Follow-up to SPARK-55600 (PR #54382).
> Reviewer ueshin noted that the 0-column pandas-to-Arrow fix should also be 
> applied to similar places:
> * {{python/pyspark/sql/connect/session.py}} around line 626, which uses 
> {{pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([])))}}
> * -{{python/pyspark/sql/conversion.py}} around line 289, which uses 
> {{pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))}}- 
>  
> Here schema may have more fields but data frame is empty. So we need to 
> create it out instead of converting it. So this case does not apply.
> These should use the same approach introduced in SPARK-55600 for consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-55674) Apply consistent 0-column pandas-to-Arrow fix

Reply via email to