This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new c1708c94fb1 [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and 
protobuf into setup.py with specifying dependencies
c1708c94fb1 is described below

commit c1708c94fb136cc9c01c6f8461fdc8ade7175894
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Tue Dec 20 09:04:13 2022 +0900

    [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into 
setup.py with specifying dependencies
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to:
    
    - Add `pyspark.sql.connect` and `pyspark.sql.protobuf` to the PySpark 
package in PyPI.
    - Fix the documentation to specify the dependencies for Python Spark 
Connect client.
    
    ### Why are the changes needed?
    
    To guide users to use Spark Connect and Protobuf, and package these feature 
to be released properly.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, this exposes both `pyspark.sql.connect` and `pyspark.sql.protobuf` to 
the end users in PyPI package. In addition, this fixes the user-facing 
documentation about dependencies from Spark Connect.
    
    ### How was this patch tested?
    
    CI in this PR should test it out.
    
    Closes #39123 from HyukjinKwon/SPARK-41583.
    
    Lead-authored-by: Hyukjin Kwon <[email protected]>
    Co-authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/docs/source/getting_started/install.rst | 15 ++++++++-------
 python/setup.py                                | 10 ++++++++++
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index dd48741099c..d3b24be3d49 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -50,6 +50,8 @@ If you want to install extra dependencies for a specific 
component, you can inst
     pip install pyspark[sql]
     # pandas API on Spark
     pip install pyspark[pandas_on_spark] plotly  # to plot your data, you can 
install plotly together.
+    # Spark Connect
+    pip install pyspark[connect]
 
 For PySpark with/without a specific Hadoop version, you can install it by 
using ``PYSPARK_HADOOP_VERSION`` environment variables as below:
 
@@ -151,16 +153,15 @@ To install PySpark from source, refer to 
|building_spark|_.
 
 Dependencies
 ------------
-============= ========================= ======================================
+============= ========================= 
======================================================================================
 Package       Minimum supported version Note
-============= ========================= ======================================
-`pandas`      1.0.5                     Optional for Spark SQL
-`pyarrow`     1.0.0                     Optional for Spark SQL
+============= ========================= 
======================================================================================
 `py4j`        0.10.9.7                  Required
-`pandas`      1.0.5                     Required for pandas API on Spark
-`pyarrow`     1.0.0                     Required for pandas API on Spark
+`pandas`      1.0.5                     Required for pandas API on Spark and 
Spark Connect; Optional for Spark SQL
+`pyarrow`     1.0.0                     Required for pandas API on Spark and 
Spark Connect; Optional for Spark SQL
 `numpy`       1.15                      Required for pandas API on Spark and 
MLLib DataFrame-based API; Optional for Spark SQL
-============= ========================= ======================================
+`grpc`        1.48.1                    Required for Spark Connect
+============= ========================= 
======================================================================================
 
 Note that PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.  
 If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow 
related features and refer
diff --git a/python/setup.py b/python/setup.py
index 65db3912efe..4ba2740246a 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -113,6 +113,7 @@ if (in_spark):
 # Also don't forget to update python/docs/source/getting_started/install.rst.
 _minimum_pandas_version = "1.0.5"
 _minimum_pyarrow_version = "1.0.0"
+_minimum_grpc_version = "1.48.1"
 
 
 class InstallCommand(install):
@@ -215,7 +216,10 @@ try:
                   'pyspark.ml.param',
                   'pyspark.sql',
                   'pyspark.sql.avro',
+                  'pyspark.sql.connect',
+                  'pyspark.sql.connect.proto',
                   'pyspark.sql.pandas',
+                  'pyspark.sql.protobuf',
                   'pyspark.sql.streaming',
                   'pyspark.streaming',
                   'pyspark.bin',
@@ -273,6 +277,12 @@ try:
                 'pyarrow>=%s' % _minimum_pyarrow_version,
                 'numpy>=1.15',
             ],
+            'connect': [
+                'pandas>=%s' % _minimum_pandas_version,
+                'pyarrow>=%s' % _minimum_pyarrow_version,
+                'grpc>=%s' % _minimum_grpc_version,
+                'numpy>=1.15',
+            ],
         },
         python_requires='>=3.7',
         classifiers=[


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to