bhaskarkvvsr opened a new issue, #14037:
URL: https://github.com/apache/arrow/issues/14037
I am trying to convert a spark data frame to pandas data frame by enabling
these two flags
```
'spark.sql.execution.arrow.pyspark.enabled'
'spark.sql.execution.arrow.pyspark.fallback.enabled'
```
But I'm getting this error while trying to do so.
```
File
/opt/conda/envs/python385/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:108,
in PandasConversionMixin.toPandas(self)
106 # Rename columns to avoid duplicated column names.
107 tmp_column_names = ['col_{}'.format(i) for i in
range(len(self.columns))]
--> 108 self_destruct = self.sql_ctx._conf.arrowPySparkSelfDestructEnabled()
109 batches = self.toDF(*tmp_column_names)._collect_as_arrow(
110 split_batches=self_destruct)
111 if len(batches) > 0:
Py4JError: An error occurred while calling
o1723.arrowPySparkSelfDestructEnabled. Trace:
py4j.Py4JException: Method arrowPySparkSelfDestructEnabled([]) does not exist
```
I have installed `pyarrow` through conda-forge
python version- 3.8.5
Pyspark version - 3.2.1
Pyarrow version - 8.0.0
OS details-
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]