[spark] branch branch-3.1 updated: [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option

gurwls223 Tue, 05 Jan 2021 00:30:37 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new 62838cc  [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to 
PYSPARK_HADOOP_VERSION in pip installation option
62838cc is described below

commit 62838cc71c839acd176af983799b027b09ca2c2f
Author: HyukjinKwon <[email protected]>
AuthorDate: Tue Jan 5 17:21:32 2021 +0900

    [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to 
PYSPARK_HADOOP_VERSION in pip installation option
    
    ### What changes were proposed in this pull request?
    
    This PR is a followup of https://github.com/apache/spark/pull/29703.
    It renames `HADOOP_VERSION` environment variable to 
`PYSPARK_HADOOP_VERSION` in case `HADOOP_VERSION` is already being used 
somewhere. Arguably `HADOOP_VERSION` is a pretty common name. I see here and 
there:
    - 
https://www.ibm.com/support/knowledgecenter/SSZUMP_7.2.1/install_grid_sym/understanding_advanced_edition.html
    - https://cwiki.apache.org/confluence/display/ARROW/HDFS+Filesystem+Support
    - http://crs4.github.io/pydoop/_pydoop1/installation.html
    
    ### Why are the changes needed?
    
    To avoid the environment variables is unexpectedly conflicted.
    
    ### Does this PR introduce _any_ user-facing change?
    
    It renames the environment variable but it's not released yet.
    
    ### How was this patch tested?
    
    Existing unittests will test.
    
    Closes #31028 from HyukjinKwon/SPARK-32017-followup.
    
    Authored-by: HyukjinKwon <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
    (cherry picked from commit 329850c667305053e4433c4c6da0e47b231302d4)
    Signed-off-by: HyukjinKwon <[email protected]>
---
 python/docs/source/getting_started/install.rst | 10 +++++-----
 python/pyspark/find_spark_home.py              |  2 +-
 python/setup.py                                | 14 +++++++-------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index a90f5fe..c548542 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -48,11 +48,11 @@ If you want to install extra dependencies for a specific 
component, you can inst
 
     pip install pyspark[sql]
 
-For PySpark with/without a specific Hadoop version, you can install it by 
using ``HADOOP_VERSION`` environment variables as below:
+For PySpark with/without a specific Hadoop version, you can install it by 
using ``PYSPARK_HADOOP_VERSION`` environment variables as below:
 
 .. code-block:: bash
 
-    HADOOP_VERSION=2.7 pip install pyspark
+    PYSPARK_HADOOP_VERSION=2.7 pip install pyspark
 
 The default distribution uses Hadoop 3.2 and Hive 2.3. If users specify 
different versions of Hadoop, the pip installation automatically
 downloads a different version and use it in PySpark. Downloading it can take a 
while depending on
@@ -60,15 +60,15 @@ the network and the mirror chosen. 
``PYSPARK_RELEASE_MIRROR`` can be set to manu
 
 .. code-block:: bash
 
-    PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org HADOOP_VERSION=2.7 pip 
install
+    PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org 
PYSPARK_HADOOP_VERSION=2.7 pip install
 
 It is recommended to use ``-v`` option in ``pip`` to track the installation 
and download status.
 
 .. code-block:: bash
 
-    HADOOP_VERSION=2.7 pip install pyspark -v
+    PYSPARK_HADOOP_VERSION=2.7 pip install pyspark -v
 
-Supported values in ``HADOOP_VERSION`` are:
+Supported values in ``PYSPARK_HADOOP_VERSION`` are:
 
 - ``without``: Spark pre-built with user-provided Apache Hadoop
 - ``2.7``: Spark pre-built for Apache Hadoop 2.7
diff --git a/python/pyspark/find_spark_home.py 
b/python/pyspark/find_spark_home.py
index 4521a36..62a36d4 100755
--- a/python/pyspark/find_spark_home.py
+++ b/python/pyspark/find_spark_home.py
@@ -36,7 +36,7 @@ def _find_spark_home():
                 (os.path.isdir(os.path.join(path, "jars")) or
                  os.path.isdir(os.path.join(path, "assembly"))))
 
-    # Spark distribution can be downloaded when HADOOP_VERSION environment 
variable is set.
+    # Spark distribution can be downloaded when PYSPARK_HADOOP_VERSION 
environment variable is set.
     # We should look up this directory first, see also SPARK-32017.
     spark_dist_dir = "spark-distribution"
     paths = [
diff --git a/python/setup.py b/python/setup.py
index f5836ec..5049173 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -125,16 +125,16 @@ class InstallCommand(install):
         spark_dist = os.path.join(self.install_lib, "pyspark", 
"spark-distribution")
         rmtree(spark_dist, ignore_errors=True)
 
-        if ("HADOOP_VERSION" in os.environ) or ("HIVE_VERSION" in os.environ):
-            # Note that SPARK_VERSION environment is just a testing purpose.
-            # HIVE_VERSION environment variable is also internal for now in 
case
+        if ("PYSPARK_HADOOP_VERSION" in os.environ) or ("PYSPARK_HIVE_VERSION" 
in os.environ):
+            # Note that PYSPARK_VERSION environment is just a testing purpose.
+            # PYSPARK_HIVE_VERSION environment variable is also internal for 
now in case
             # we support another version of Hive in the future.
             spark_version, hadoop_version, hive_version = 
install_module.checked_versions(
-                os.environ.get("SPARK_VERSION", VERSION).lower(),
-                os.environ.get("HADOOP_VERSION", 
install_module.DEFAULT_HADOOP).lower(),
-                os.environ.get("HIVE_VERSION", 
install_module.DEFAULT_HIVE).lower())
+                os.environ.get("PYSPARK_VERSION", VERSION).lower(),
+                os.environ.get("PYSPARK_HADOOP_VERSION", 
install_module.DEFAULT_HADOOP).lower(),
+                os.environ.get("PYSPARK_HIVE_VERSION", 
install_module.DEFAULT_HIVE).lower())
 
-            if ("SPARK_VERSION" not in os.environ and
+            if ("PYSPARK_VERSION" not in os.environ and
                 ((install_module.DEFAULT_HADOOP, install_module.DEFAULT_HIVE) 
==
                     (hadoop_version, hive_version))):
                 # Do not download and install if they are same as default.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.1 updated: [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option

Reply via email to