This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.3 by this push:
new b7d8ddbf586 [SPARK-44184][PYTHON][DOCS] Remove a wrong doc about
`ARROW_PRE_0_15_IPC_FORMAT`
b7d8ddbf586 is described below
commit b7d8ddbf586ee076b21b1c501ef44a22c8ce11f2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sun Jun 25 18:53:25 2023 -0700
[SPARK-44184][PYTHON][DOCS] Remove a wrong doc about
`ARROW_PRE_0_15_IPC_FORMAT`
### What changes were proposed in this pull request?
This PR aims to remove a wrong documentation about
`ARROW_PRE_0_15_IPC_FORMAT`.
### Why are the changes needed?
Since Apache Spark 3.0.0, Spark doesn't allow `ARROW_PRE_0_15_IPC_FORMAT`
environment variable at all.
https://github.com/apache/spark/blob/2407183cb8637b6ac2d1b76320cae9cbde3411da/python/pyspark/sql/pandas/utils.py#L69-L73
### Does this PR introduce _any_ user-facing change?
No. This is a removal of outdated wrong documentation.
### How was this patch tested?
Manual review.
Closes #41730 from dongjoon-hyun/SPARK-44184.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 00e7c08606d0b6de22604d2a7350ea0711355300)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/docs/source/user_guide/sql/arrow_pandas.rst | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/python/docs/source/user_guide/sql/arrow_pandas.rst
b/python/docs/source/user_guide/sql/arrow_pandas.rst
index 9675b1096f0..ac71649f1de 100644
--- a/python/docs/source/user_guide/sql/arrow_pandas.rst
+++ b/python/docs/source/user_guide/sql/arrow_pandas.rst
@@ -391,25 +391,6 @@ For usage with pyspark.sql, the minimum supported versions
of Pandas is 1.0.5 an
Higher versions may be used, however, compatibility and data correctness can
not be guaranteed and should
be verified by the user.
-Compatibility Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Since Arrow 0.15.0, a change in the binary IPC format requires an environment
variable to be
-compatible with previous versions of Arrow <= 0.14.1. This is only necessary
to do for PySpark
-users with versions 2.3.x and 2.4.x that have manually upgraded PyArrow to
0.15.0. The following
-can be added to ``conf/spark-env.sh`` to use the legacy Arrow IPC format:
-
-.. code-block:: bash
-
- ARROW_PRE_0_15_IPC_FORMAT=1
-
-
-This will instruct PyArrow >= 0.15.0 to use the legacy IPC format with the
older Arrow Java that
-is in Spark 2.3.x and 2.4.x. Not setting this environment variable will lead
to a similar error as
-described in `SPARK-29367
<https://issues.apache.org/jira/browse/SPARK-29367>`_ when running
-``pandas_udf``\s or :meth:`DataFrame.toPandas` with Arrow enabled. More
information about the Arrow IPC change can
-be read on the Arrow 0.15.0 release `blog
<https://arrow.apache.org/blog/2019/10/06/0.15.0-release/#columnar-streaming-protocol-change-since-0140>`_.
-
Setting Arrow ``self_destruct`` for memory savings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]