This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new f893a19 [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some
more information in installation guide
f893a19 is described below
commit f893a19c4cf62dd13bf179de75af6feb677c4154
Author: HyukjinKwon <[email protected]>
AuthorDate: Sun Sep 20 10:58:17 2020 +0900
[SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more
information in installation guide
### What changes were proposed in this pull request?
This PR:
- rephrases some wordings in installation guide to avoid using the terms
that can be potentially ambiguous such as "different favors"
- documents extra dependency installation `pip install pyspark[sql]`
- uses the link that corresponds to the released version. e.g.)
https://spark.apache.org/docs/latest/building-spark.html vs
https://spark.apache.org/docs/3.0.0/building-spark.html
- adds some more details
I built it on Read the Docs to make it easier to review:
https://hyukjin-spark.readthedocs.io/en/stable/getting_started/install.html
### Why are the changes needed?
To improve installation guide.
### Does this PR introduce _any_ user-facing change?
Yes, it updates the user-facing installation guide.
### How was this patch tested?
Manually built the doc and tested.
Closes #29779 from HyukjinKwon/SPARK-32180.
Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
---
python/docs/source/conf.py | 6 +-
python/docs/source/getting_started/index.rst | 2 +-
python/docs/source/getting_started/install.rst | 138 +++++++++++++++++++++
.../docs/source/getting_started/installation.rst | 114 -----------------
python/setup.py | 3 +
5 files changed, 147 insertions(+), 116 deletions(-)
diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 738765a..9d87bbe 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -57,7 +57,11 @@ rst_epilog = """
.. _binder:
https://mybinder.org/v2/gh/apache/spark/{0}?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb
.. |examples| replace:: Examples
.. _examples: https://github.com/apache/spark/tree/{0}/examples/src/main/python
-""".format(os.environ.get("RELEASE_TAG", "master"))
+.. |downloading| replace:: Downloading
+.. _downloading: https://spark.apache.org/docs/{1}/building-spark.html
+.. |building_spark| replace:: Building Spark
+.. _building_spark: https://spark.apache.org/docs/{1}/#downloading
+""".format(os.environ.get("RELEASE_TAG", "master"),
os.environ.get('RELEASE_VERSION', "latest"))
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
diff --git a/python/docs/source/getting_started/index.rst
b/python/docs/source/getting_started/index.rst
index 0f3cea7..9fa3352 100644
--- a/python/docs/source/getting_started/index.rst
+++ b/python/docs/source/getting_started/index.rst
@@ -25,5 +25,5 @@ This page summarizes the basic steps required to setup and
get started with PySp
.. toctree::
:maxdepth: 2
- installation
+ install
quickstart
diff --git a/python/docs/source/getting_started/install.rst
b/python/docs/source/getting_started/install.rst
new file mode 100644
index 0000000..03570e6
--- /dev/null
+++ b/python/docs/source/getting_started/install.rst
@@ -0,0 +1,138 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache
Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This
is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+
+This page includes instructions for installing PySpark by using pip, Conda,
downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as
follows:
+
+.. code-block:: bash
+
+ pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can
install it as below:
+
+.. code-block:: bash
+
+ pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system
which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both
cross-platform and
+language agnostic. In practice, Conda can replace both `pip
<https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+ conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list
of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+ conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+ conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in
the newly created
+environment, for example as below. It will install PySpark under the new
virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+ pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+ conda install pyspark
+
+However, note that `PySpark at Conda
<https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community
separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark
website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress
the tar file into the directoy where you want
+to install Spark, for example, as below:
+
+.. code-block:: bash
+
+ tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
+
+Ensure the ``SPARK_HOME`` environment variable points to the directory where
the tar file has been extracted.
+Update ``PYTHONPATH`` environment variable such that it can find the PySpark
and Py4J under ``SPARK_HOME/python/lib``.
+One example of doing this is shown below:
+
+.. code-block:: bash
+
+ cd spark-3.0.0-bin-hadoop2.7
+ export SPARK_HOME=`pwd`
+ export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo
"${ZIPS[*]}"):$PYTHONPATH
+
+
+Installing from Source
+----------------------
+
+To install PySpark from source, refer to |building_spark|_.
+
+
+Dependencies
+------------
+============= ========================= ================
+Package Minimum supported version Note
+============= ========================= ================
+`pandas` 0.23.2 Optional for SQL
+`NumPy` 1.7 Required for ML
+`pyarrow` 0.15.1 Optional for SQL
+`Py4J` 0.10.9 Required
+============= ========================= ================
+
+Note that PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.
+If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow
related features and refer
+to |downloading|_.
diff --git a/python/docs/source/getting_started/installation.rst
b/python/docs/source/getting_started/installation.rst
deleted file mode 100644
index 914045e..0000000
--- a/python/docs/source/getting_started/installation.rst
+++ /dev/null
@@ -1,114 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
-.. http://www.apache.org/licenses/LICENSE-2.0
-
-.. Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
-
-============
-Installation
-============
-
-Official releases are available from the `Apache Spark website
<https://spark.apache.org/downloads.html>`_.
-Alternatively, you can install it via ``pip`` from PyPI. PyPI installation is
usually for standalone
-locally or as a client to connect to a cluster instead of setting a cluster
up.
-
-This page includes the instructions for installing PySpark by using pip,
Conda, downloading manually, and building it from the source.
-
-Python Version Supported
-------------------------
-
-Python 3.6 and above.
-
-Using PyPI
-----------
-
-PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_
-
-.. code-block:: bash
-
- pip install pyspark
-
-Using Conda
------------
-
-Conda is an open-source package management and environment management system
which is a part of the `Anaconda <https://docs.continuum.io/anaconda/>`_
distribution. It is both cross-platform and language agnostic.
-
-Conda can be used to create a virtual environment from terminal as shown below:
-
-.. code-block:: bash
-
- conda create -n pyspark_env
-
-After the virtual environment is created, it should be visible under the list
of Conda environments which can be seen using the following command:
-
-.. code-block:: bash
-
- conda env list
-
-The newly created environment can be accessed using the following command:
-
-.. code-block:: bash
-
- conda activate pyspark_env
-
-In Conda version earlier than 4.4, the following command should be used:
-
-.. code-block:: bash
-
- source activate pyspark_env
-
-Refer to `Using PyPI <#using-pypi>`_ to install PySpark in the newly created
environment.
-
-Note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is
available but not necessarily synced with PySpark release cycle because it is
maintained by the community separately.
-
-Official Release Channel
-------------------------
-
-Different flavors of PySpark are available in the `Apache Spark website
<https://spark.apache.org/downloads.html>`_.
-Any suitable version can be downloaded and extracted as below:
-
-.. code-block:: bash
-
- tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
-
-Ensure the `SPARK_HOME` environment variable points to the directory where the
code has been extracted.
-Define `PYTHONPATH` such that it can find the PySpark and Py4J under
`SPARK_HOME/python/lib`.
-One example of doing this is shown below:
-
-.. code-block:: bash
-
- cd spark-3.0.0-bin-hadoop2.7
- export SPARK_HOME=`pwd`
- export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo
"${ZIPS[*]}"):$PYTHONPATH
-
-Installing from Source
-----------------------
-
-To install PySpark from source, refer to `Building Spark
<https://spark.apache.org/docs/latest/building-spark.html>`_.
-
-Refer to `steps above <#official-release-channel>`_ to define ``PYTHONPATH``.
-
-Dependencies
-------------
-============= ========================= ================
-Package Minimum supported version Note
-============= ========================= ================
-`pandas` 0.23.2 Optional for SQL
-`NumPy` 1.7 Required for ML
-`pyarrow` 0.15.1 Optional for SQL
-`Py4J` 0.10.9 Required
-============= ========================= ================
-
-**Note**: PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.
-If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow
related features and refer to `Downloading
<https://spark.apache.org/docs/latest/#downloading>`_
\ No newline at end of file
diff --git a/python/setup.py b/python/setup.py
index b4cc24a..7fac7b3 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -99,6 +99,7 @@ if (in_spark):
# If you are changing the versions here, please also change
./python/pyspark/sql/pandas/utils.py
# For Arrow, you should also check ./pom.xml and ensure there are no breaking
changes in the
# binary format protocol with the Java version, see ARROW_HOME/format/* for
specifications.
+# Also don't forget to update python/docs/source/getting_started/install.rst.
_minimum_pandas_version = "0.23.2"
_minimum_pyarrow_version = "1.0.0"
@@ -203,6 +204,8 @@ try:
'pyspark.examples.src.main.python': ['*.py', '*/*.py']},
scripts=scripts,
license='http://www.apache.org/licenses/LICENSE-2.0',
+ # Don't forget to update python/docs/source/getting_started/install.rst
+ # if you're updating the versions or dependencies.
install_requires=['py4j==0.10.9'],
extras_require={
'ml': ['numpy>=1.7'],
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]