[ 
https://issues.apache.org/jira/browse/SPARK-44486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-44486:
---------------------------------
    Description: 
Implement PyArrow `self_destruct` feature for `toPandas`

To make the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 

  was:
Implement PyArrow `self_destruct` feature for `toPandas`

 

Now the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 


> Implement PyArrow `self_destruct` feature for `toPandas`
> --------------------------------------------------------
>
>                 Key: SPARK-44486
>                 URL: https://issues.apache.org/jira/browse/SPARK-44486
>             Project: Spark
>          Issue Type: Improvement
>          Components: Connect, PySpark
>    Affects Versions: 4.0.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> Implement PyArrow `self_destruct` feature for `toPandas`
> To make the Spark configuration 
> `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable 
> PyArrow’s `self_destruct` feature in Spark Connect, which can save memory 
> when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated 
> memory while building the Pandas DataFrame. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to