[
https://issues.apache.org/jira/browse/SPARK-47068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-47068.
---------------------------------
> Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch
> ----------------------------------------------------------------------
>
> Key: SPARK-47068
> URL: https://issues.apache.org/jira/browse/SPARK-47068
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.4.1, 3.5.0, 4.0.0
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.5.2, 3.4.3, 4.0.0
>
>
> {code}
> import pandas as pd
> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
> spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
> spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
> spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
> spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
> spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
> {code}
> {code}
> /.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning:
> createDataFrame attempted Arrow optimization because
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached
> the error below and will not continue because automatic fallback with
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
> range() arg 3 must not be zero
> warn(msg)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/.../spark/python/pyspark/sql/session.py", line 1483, in
> createDataFrame
> return super(SparkSession, self).createDataFrame( # type:
> ignore[call-overload]
> File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in
> createDataFrame
> return self._create_from_pandas_with_arrow(data, schema, timezone)
> File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in
> _create_from_pandas_with_arrow
> pdf_slices = (pdf.iloc[start : start + step] for start in range(0,
> len(pdf), step))
> ValueError: range() arg 3 must not be zero
> {code}
> {code}
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]