This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 9bfc8460d63 [SPARK-40993][SPARK-41705][CONNECT] Move Spark Connect 
documentation and script to dev/ and Python documentation
9bfc8460d63 is described below

commit 9bfc8460d6379e38a775969f463f1de81474a0ae
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Jan 2 17:36:44 2023 +0900

    [SPARK-40993][SPARK-41705][CONNECT] Move Spark Connect documentation and 
script to dev/ and Python documentation
    
    ### What changes were proposed in this pull request?
    
    This PR takes over https://github.com/apache/spark/pull/39211 and 
https://github.com/apache/spark/pull/38477 that proposes:
    
    - Move `connector/connect/dev/generate_protos.sh` → 
`dev/generate_protos.sh` to be consistent with other places
    - Move Python-specific development guides into 
`python/docs/source/development/testing.rst`
    
    ### Why are the changes needed?
    
    To keep the project structure and documentation consistent.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Python-specific development guides for Spark Connect will be added in 
https://spark.apache.org/docs/latest/api/python/development/testing.html.
    
    ### How was this patch tested?
    
    I manually tested:
    
    ```
    ./dev/generate_protos.sh
    ./dev/check-codegen-python.py
    ```
    
    I also manually verified the Python documentation.
    
    Closes #39338 from HyukjinKwon/SPARK-41705.
    
    Lead-authored-by: Hyukjin Kwon <[email protected]>
    Co-authored-by: Ted Yu <[email protected]>
    Co-authored-by: Rui Wang <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .github/workflows/build_and_test.yml               |  2 +-
 connector/connect/README.md                        | 74 +++-------------------
 ...k-codegen-python.py => connect-check-protos.py} |  4 +-
 .../connect-gen-protos.sh                          |  4 +-
 python/docs/source/development/contributing.rst    |  2 +
 python/docs/source/development/testing.rst         | 52 ++++++++++++++-
 6 files changed, 68 insertions(+), 70 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 443fbf47942..17c4f06dc28 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -611,7 +611,7 @@ jobs:
     - name: Python linter
       run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
     - name: Python code generation check
-      run: if test -f ./dev/check-codegen-python.py; then 
PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9 
./dev/check-codegen-python.py; fi
+      run: if test -f ./dev/connect-check-protos.py; then 
PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9 
./dev/connect-check-protos.py; fi
     - name: R linter
       run: ./dev/lint-r
     - name: JS linter
diff --git a/connector/connect/README.md b/connector/connect/README.md
index d5cc767c744..4f2e06678dd 100644
--- a/connector/connect/README.md
+++ b/connector/connect/README.md
@@ -1,29 +1,28 @@
-# Spark Connect - Developer Documentation
+# Spark Connect
 
 **Spark Connect is a strictly experimental feature and under heavy development.
 All APIs should be considered volatile and should not be used in production.**
 
 This module contains the implementation of Spark Connect which is a logical 
plan
 facade for the implementation in Spark. Spark Connect is directly integrated 
into the build
-of Spark. To enable it, you only need to activate the driver plugin for Spark 
Connect.
+of Spark.
 
 The documentation linked here is specifically for developers of Spark Connect 
and not
 directly intended to be end-user documentation.
 
+## Development Topics
 
-## Getting Started 
+### Guidelines for new clients
 
-### Build
+When contributing a new client please be aware that we strive to have a common
+user experience across all languages. Please follow the below guidelines:
 
-```bash
-./build/mvn -Phive clean package
-```
+* [Connection string configuration](docs/client-connection-string.md)
+* [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect 
protocol.
 
-or
+### Python client developement
 
-```bash
-./build/sbt -Phive clean package
-```
+Python-specific developement guidelines are located in 
[python/docs/source/development/testing.rst](https://github.com/apache/spark/blob/master/python/docs/source/development/testing.rst)
 that is published at [Development 
tab](https://spark.apache.org/docs/latest/api/python/development/index.html) in 
PySpark documentation.
 
 ### Build with user-defined `protoc` and `protoc-gen-grpc-java`
 
@@ -48,56 +47,3 @@ export 
CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
 The user-defined `protoc` and `protoc-gen-grpc-java` binary files can be 
produced in the user's compilation environment by source code compilation, 
 for compilation steps, please refer to 
[protobuf](https://github.com/protocolbuffers/protobuf) and 
[grpc-java](https://github.com/grpc/grpc-java).
 
-
-### Run Spark Shell
-
-To run Spark Connect you locally built:
-
-```bash
-# Scala shell
-./bin/spark-shell \
-  --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste 
-sd ',' -` \
-  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-
-# PySpark shell
-./bin/pyspark \
-  --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste 
-sd ',' -` \
-  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-```
-
-To use the release version of Spark Connect:
-
-```bash
-./bin/spark-shell \
-  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
-  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-```
-
-### Run Tests
-
-```bash
-# Run a single Python class.
-./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_basic'
-```
-
-```bash
-# Run all Spark Connect Python tests as a module.
-./python/run-tests --module pyspark-connect --parallelism 1
-```
-
-
-## Development Topics
-
-### Generate proto generated files for the Python client
-1. Install `buf version 1.11.0`: https://docs.buf.build/installation
-2. Run `pip install grpcio==1.48.1 protobuf==3.19.5 mypy-protobuf==3.3.0 
googleapis-common-protos==1.56.4 grpcio-status==1.48.1`
-3. Run `./connector/connect/dev/generate_protos.sh`
-4. Optional Check `./dev/check-codegen-python.py`
-
-### Guidelines for new clients
-
-When contributing a new client please be aware that we strive to have a common
-user experience across all languages. Please follow the below guidelines:
-
-* [Connection string configuration](docs/client-connection-string.md)
-* [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect 
protocol.
diff --git a/dev/check-codegen-python.py b/dev/connect-check-protos.py
similarity index 94%
rename from dev/check-codegen-python.py
rename to dev/connect-check-protos.py
index bcb2b0341da..b902274b1f4 100755
--- a/dev/check-codegen-python.py
+++ b/dev/connect-check-protos.py
@@ -46,7 +46,7 @@ def run_cmd(cmd):
 def check_connect_protos():
     print("Start checking the generated codes in pyspark-connect.")
     with tempfile.TemporaryDirectory() as tmp:
-        run_cmd(f"{SPARK_HOME}/connector/connect/dev/generate_protos.sh {tmp}")
+        run_cmd(f"{SPARK_HOME}/dev/connect-gen-protos.sh {tmp}")
         result = filecmp.dircmp(
             f"{SPARK_HOME}/python/pyspark/sql/connect/proto/",
             tmp,
@@ -76,7 +76,7 @@ def check_connect_protos():
             fail(
                 "Generated files for pyspark-connect are out of sync! "
                 "If you have touched files under 
connector/connect/src/main/protobuf, "
-                "please run ./connector/connect/dev/generate_protos.sh. "
+                "please run ./dev/connect-gen-protos.sh. "
                 "If you haven't touched any file above, please rebase your PR 
against main branch."
             )
 
diff --git a/connector/connect/dev/generate_protos.sh 
b/dev/connect-gen-protos.sh
similarity index 96%
rename from connector/connect/dev/generate_protos.sh
rename to dev/connect-gen-protos.sh
index 38cb821a47c..cb5b66379b2 100755
--- a/connector/connect/dev/generate_protos.sh
+++ b/dev/connect-gen-protos.sh
@@ -20,12 +20,12 @@ set -ex
 
 if [[ $# -gt 1 ]]; then
   echo "Illegal number of parameters."
-  echo "Usage: ./connector/connect/dev/generate_protos.sh [path]"
+  echo "Usage: ./dev/generate_protos.sh [path]"
   exit -1
 fi
 
 
-SPARK_HOME="$(cd "`dirname $0`"/../../..; pwd)"
+SPARK_HOME="$(cd "`dirname $0`"/..; pwd)"
 cd "$SPARK_HOME"
 
 
diff --git a/python/docs/source/development/contributing.rst 
b/python/docs/source/development/contributing.rst
index 88f7b3a7b43..385e7db035d 100644
--- a/python/docs/source/development/contributing.rst
+++ b/python/docs/source/development/contributing.rst
@@ -120,6 +120,8 @@ Prerequisite
 
 PySpark development requires to build Spark that needs a proper JDK installed, 
etc. See `Building Spark 
<https://spark.apache.org/docs/latest/building-spark.html>`_ for more details.
 
+Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.11.0`` is required, see `Buf Installation 
<https://docs.buf.build/installation>`_ for more details.
+
 Conda
 ~~~~~
 
diff --git a/python/docs/source/development/testing.rst 
b/python/docs/source/development/testing.rst
index 3eab8d04511..0262c318cd6 100644
--- a/python/docs/source/development/testing.rst
+++ b/python/docs/source/development/testing.rst
@@ -25,6 +25,11 @@ In order to run PySpark tests, you should build Spark itself 
first via Maven or
 
     build/mvn -DskipTests clean package
 
+.. code-block:: bash
+
+    build/sbt -Phive clean package
+
+
 After that, the PySpark test cases can be run via using ``python/run-tests``. 
For example,
 
 .. code-block:: bash
@@ -49,9 +54,54 @@ You can run a specific test via using ``python/run-tests``, 
for example, as belo
 Please refer to `Testing PySpark 
<https://spark.apache.org/developer-tools.html>`_ for more details.
 
 
-Running tests using GitHub Actions
+Running Tests using GitHub Actions
 ----------------------------------
 
 You can run the full PySpark tests by using GitHub Actions in your own forked 
GitHub
 repository with a few clicks. Please refer to
 `Running tests in your forked repository using GitHub Actions 
<https://spark.apache.org/developer-tools.html>`_ for more details.
+
+
+Running Tests for Spark Connect
+-------------------------------
+
+Running Tests for Python Client
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to run the tests for Spark Connect in Pyth, you should pass 
``--parallelism 1`` option together, for example, as below:
+
+.. code-block:: bash
+
+    python/run-tests --module pyspark-connect --parallelism 1
+
+Note that if you made some changes in Protobuf definitions, for example, at
+`spark/connector/connect/common/src/main/protobuf/spark/connect 
<https://github.com/apache/spark/tree/master/connector/connect/common/src/main/protobuf/spark/connect>`_,
+you should regenerate Python Protobuf client by running 
``dev/connect-gen-protos.sh``.
+
+
+Running PySpark Shell with Python Client
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To run Spark Connect server you locally built:
+
+.. code-block:: bash
+
+    bin/spark-shell \
+      --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | 
paste -sd ',' -` \
+      --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+
+To run the Spark Connect server from the Apache Spark release:
+
+.. code-block:: bash
+
+    bin/spark-shell \
+      --packages org.apache.spark:spark-connect_2.12:3.4.0 \
+      --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+
+
+To run the PySpark Shell with the client for the Spark Connect server:
+
+.. code-block:: bash
+
+    bin/pyspark --remote sc://localhost
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to