[spark] branch master updated: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

gurwls223 Sat, 19 Sep 2020 19:04:58 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new f893a19  [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some 
more information in installation guide
f893a19 is described below

commit f893a19c4cf62dd13bf179de75af6feb677c4154
Author: HyukjinKwon <[email protected]>
AuthorDate: Sun Sep 20 10:58:17 2020 +0900

    [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more 
information in installation guide
    
    ### What changes were proposed in this pull request?
    
    This PR:
    - rephrases some wordings in installation guide to avoid using the terms 
that can be potentially ambiguous such as "different favors"
    - documents extra dependency installation `pip install pyspark[sql]`
    - uses the link that corresponds to the released version. e.g.) 
https://spark.apache.org/docs/latest/building-spark.html vs 
https://spark.apache.org/docs/3.0.0/building-spark.html
    - adds some more details
    
    I built it on Read the Docs to make it easier to review: 
https://hyukjin-spark.readthedocs.io/en/stable/getting_started/install.html
    
    ### Why are the changes needed?
    
    To improve installation guide.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it updates the user-facing installation guide.
    
    ### How was this patch tested?
    
    Manually built the doc and tested.
    
    Closes #29779 from HyukjinKwon/SPARK-32180.
    
    Authored-by: HyukjinKwon <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
---
 python/docs/source/conf.py                         |   6 +-
 python/docs/source/getting_started/index.rst       |   2 +-
 python/docs/source/getting_started/install.rst     | 138 +++++++++++++++++++++
 .../docs/source/getting_started/installation.rst   | 114 -----------------
 python/setup.py                                    |   3 +
 5 files changed, 147 insertions(+), 116 deletions(-)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 738765a..9d87bbe 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -57,7 +57,11 @@ rst_epilog = """
 .. _binder: 
https://mybinder.org/v2/gh/apache/spark/{0}?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb
 .. |examples| replace:: Examples
 .. _examples: https://github.com/apache/spark/tree/{0}/examples/src/main/python
-""".format(os.environ.get("RELEASE_TAG", "master"))
+.. |downloading| replace:: Downloading
+.. _downloading: https://spark.apache.org/docs/{1}/building-spark.html
+.. |building_spark| replace:: Building Spark
+.. _building_spark: https://spark.apache.org/docs/{1}/#downloading
+""".format(os.environ.get("RELEASE_TAG", "master"), 
os.environ.get('RELEASE_VERSION', "latest"))
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
diff --git a/python/docs/source/getting_started/index.rst 
b/python/docs/source/getting_started/index.rst
index 0f3cea7..9fa3352 100644
--- a/python/docs/source/getting_started/index.rst
+++ b/python/docs/source/getting_started/index.rst
@@ -25,5 +25,5 @@ This page summarizes the basic steps required to setup and 
get started with PySp
 .. toctree::
     :maxdepth: 2
 
-    installation
+    install
     quickstart
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
new file mode 100644
index 0000000..03570e6
--- /dev/null
+++ b/python/docs/source/getting_started/install.rst
@@ -0,0 +1,138 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache 
Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This 
is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+ 
+This page includes instructions for installing PySpark by using pip, Conda, 
downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as 
follows:
+
+.. code-block:: bash
+
+    pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can 
install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system 
which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both 
cross-platform and
+language agnostic. In practice, Conda can replace both `pip 
<https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+    conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list 
of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+    conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+    conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in 
the newly created
+environment, for example as below. It will install PySpark under the new 
virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+    pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+    conda install pyspark
+
+However, note that `PySpark at Conda 
<https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community 
separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark 
website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress 
the tar file into the directoy where you want
+to install Spark, for example, as below:
+
+.. code-block:: bash
+
+    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
+
+Ensure the ``SPARK_HOME`` environment variable points to the directory where 
the tar file has been extracted.
+Update ``PYTHONPATH`` environment variable such that it can find the PySpark 
and Py4J under ``SPARK_HOME/python/lib``.
+One example of doing this is shown below:
+
+.. code-block:: bash
+
+    cd spark-3.0.0-bin-hadoop2.7
+    export SPARK_HOME=`pwd`
+    export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo 
"${ZIPS[*]}"):$PYTHONPATH
+
+
+Installing from Source
+----------------------
+
+To install PySpark from source, refer to |building_spark|_.
+
+
+Dependencies
+------------
+============= ========================= ================
+Package       Minimum supported version Note
+============= ========================= ================
+`pandas`      0.23.2                    Optional for SQL
+`NumPy`       1.7                       Required for ML 
+`pyarrow`     0.15.1                    Optional for SQL
+`Py4J`        0.10.9                    Required
+============= ========================= ================
+
+Note that PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.  
+If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow 
related features and refer
+to |downloading|_.
diff --git a/python/docs/source/getting_started/installation.rst 
b/python/docs/source/getting_started/installation.rst
deleted file mode 100644
index 914045e..0000000
--- a/python/docs/source/getting_started/installation.rst
+++ /dev/null
@@ -1,114 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-..    http://www.apache.org/licenses/LICENSE-2.0
-
-..  Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
-
-============
-Installation
-============
-
-Official releases are available from the `Apache Spark website 
<https://spark.apache.org/downloads.html>`_.
-Alternatively, you can install it via ``pip`` from PyPI.  PyPI installation is 
usually for standalone
-locally or as a client to connect to a cluster instead of setting a cluster 
up.  
- 
-This page includes the instructions for installing PySpark by using pip, 
Conda, downloading manually, and building it from the source.
-
-Python Version Supported
-------------------------
-
-Python 3.6 and above.
-
-Using PyPI
-----------
-
-PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_
-
-.. code-block:: bash
-
-    pip install pyspark
-       
-Using Conda  
------------
-
-Conda is an open-source package management and environment management system 
which is a part of the `Anaconda <https://docs.continuum.io/anaconda/>`_ 
distribution. It is both cross-platform and language agnostic.
-  
-Conda can be used to create a virtual environment from terminal as shown below:
-
-.. code-block:: bash
-
-    conda create -n pyspark_env 
-
-After the virtual environment is created, it should be visible under the list 
of Conda environments which can be seen using the following command:
-
-.. code-block:: bash
-
-    conda env list
-
-The newly created environment can be accessed using the following command:
-
-.. code-block:: bash
-
-    conda activate pyspark_env
-
-In Conda version earlier than 4.4, the following command should be used:
-
-.. code-block:: bash
-
-    source activate pyspark_env
-
-Refer to `Using PyPI <#using-pypi>`_ to install PySpark in the newly created 
environment.
-
-Note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is 
available but not necessarily synced with PySpark release cycle because it is 
maintained by the community separately.
-
-Official Release Channel
-------------------------
-
-Different flavors of PySpark are available in the `Apache Spark website 
<https://spark.apache.org/downloads.html>`_.
-Any suitable version can be downloaded and extracted as below:
-
-.. code-block:: bash
-
-    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
-
-Ensure the `SPARK_HOME` environment variable points to the directory where the 
code has been extracted. 
-Define `PYTHONPATH` such that it can find the PySpark and Py4J under 
`SPARK_HOME/python/lib`. 
-One example of doing this is shown below:
-
-.. code-block:: bash
-
-    cd spark-3.0.0-bin-hadoop2.7
-    export SPARK_HOME=`pwd`
-    export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo 
"${ZIPS[*]}"):$PYTHONPATH
-
-Installing from Source
-----------------------
-
-To install PySpark from source, refer to `Building Spark 
<https://spark.apache.org/docs/latest/building-spark.html>`_.
-
-Refer to `steps above <#official-release-channel>`_ to define ``PYTHONPATH``.
-
-Dependencies
-------------
-============= ========================= ================
-Package       Minimum supported version Note
-============= ========================= ================
-`pandas`      0.23.2                    Optional for SQL
-`NumPy`       1.7                       Required for ML 
-`pyarrow`     0.15.1                    Optional for SQL
-`Py4J`        0.10.9                    Required
-============= ========================= ================
-
-**Note**: PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.  
-If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow 
related features and refer to `Downloading 
<https://spark.apache.org/docs/latest/#downloading>`_
\ No newline at end of file
diff --git a/python/setup.py b/python/setup.py
index b4cc24a..7fac7b3 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -99,6 +99,7 @@ if (in_spark):
 # If you are changing the versions here, please also change 
./python/pyspark/sql/pandas/utils.py
 # For Arrow, you should also check ./pom.xml and ensure there are no breaking 
changes in the
 # binary format protocol with the Java version, see ARROW_HOME/format/* for 
specifications.
+# Also don't forget to update python/docs/source/getting_started/install.rst.
 _minimum_pandas_version = "0.23.2"
 _minimum_pyarrow_version = "1.0.0"
 
@@ -203,6 +204,8 @@ try:
             'pyspark.examples.src.main.python': ['*.py', '*/*.py']},
         scripts=scripts,
         license='http://www.apache.org/licenses/LICENSE-2.0',
+        # Don't forget to update python/docs/source/getting_started/install.rst
+        # if you're updating the versions or dependencies.
         install_requires=['py4j==0.10.9'],
         extras_require={
             'ml': ['numpy>=1.7'],


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Reply via email to