This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9bfc8460d63 [SPARK-40993][SPARK-41705][CONNECT] Move Spark Connect
documentation and script to dev/ and Python documentation
9bfc8460d63 is described below
commit 9bfc8460d6379e38a775969f463f1de81474a0ae
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Jan 2 17:36:44 2023 +0900
[SPARK-40993][SPARK-41705][CONNECT] Move Spark Connect documentation and
script to dev/ and Python documentation
### What changes were proposed in this pull request?
This PR takes over https://github.com/apache/spark/pull/39211 and
https://github.com/apache/spark/pull/38477 that proposes:
- Move `connector/connect/dev/generate_protos.sh` →
`dev/generate_protos.sh` to be consistent with other places
- Move Python-specific development guides into
`python/docs/source/development/testing.rst`
### Why are the changes needed?
To keep the project structure and documentation consistent.
### Does this PR introduce _any_ user-facing change?
Python-specific development guides for Spark Connect will be added in
https://spark.apache.org/docs/latest/api/python/development/testing.html.
### How was this patch tested?
I manually tested:
```
./dev/generate_protos.sh
./dev/check-codegen-python.py
```
I also manually verified the Python documentation.
Closes #39338 from HyukjinKwon/SPARK-41705.
Lead-authored-by: Hyukjin Kwon <[email protected]>
Co-authored-by: Ted Yu <[email protected]>
Co-authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.github/workflows/build_and_test.yml | 2 +-
connector/connect/README.md | 74 +++-------------------
...k-codegen-python.py => connect-check-protos.py} | 4 +-
.../connect-gen-protos.sh | 4 +-
python/docs/source/development/contributing.rst | 2 +
python/docs/source/development/testing.rst | 52 ++++++++++++++-
6 files changed, 68 insertions(+), 70 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 443fbf47942..17c4f06dc28 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -611,7 +611,7 @@ jobs:
- name: Python linter
run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
- name: Python code generation check
- run: if test -f ./dev/check-codegen-python.py; then
PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9
./dev/check-codegen-python.py; fi
+ run: if test -f ./dev/connect-check-protos.py; then
PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9
./dev/connect-check-protos.py; fi
- name: R linter
run: ./dev/lint-r
- name: JS linter
diff --git a/connector/connect/README.md b/connector/connect/README.md
index d5cc767c744..4f2e06678dd 100644
--- a/connector/connect/README.md
+++ b/connector/connect/README.md
@@ -1,29 +1,28 @@
-# Spark Connect - Developer Documentation
+# Spark Connect
**Spark Connect is a strictly experimental feature and under heavy development.
All APIs should be considered volatile and should not be used in production.**
This module contains the implementation of Spark Connect which is a logical
plan
facade for the implementation in Spark. Spark Connect is directly integrated
into the build
-of Spark. To enable it, you only need to activate the driver plugin for Spark
Connect.
+of Spark.
The documentation linked here is specifically for developers of Spark Connect
and not
directly intended to be end-user documentation.
+## Development Topics
-## Getting Started
+### Guidelines for new clients
-### Build
+When contributing a new client please be aware that we strive to have a common
+user experience across all languages. Please follow the below guidelines:
-```bash
-./build/mvn -Phive clean package
-```
+* [Connection string configuration](docs/client-connection-string.md)
+* [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect
protocol.
-or
+### Python client developement
-```bash
-./build/sbt -Phive clean package
-```
+Python-specific developement guidelines are located in
[python/docs/source/development/testing.rst](https://github.com/apache/spark/blob/master/python/docs/source/development/testing.rst)
that is published at [Development
tab](https://spark.apache.org/docs/latest/api/python/development/index.html) in
PySpark documentation.
### Build with user-defined `protoc` and `protoc-gen-grpc-java`
@@ -48,56 +47,3 @@ export
CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
The user-defined `protoc` and `protoc-gen-grpc-java` binary files can be
produced in the user's compilation environment by source code compilation,
for compilation steps, please refer to
[protobuf](https://github.com/protocolbuffers/protobuf) and
[grpc-java](https://github.com/grpc/grpc-java).
-
-### Run Spark Shell
-
-To run Spark Connect you locally built:
-
-```bash
-# Scala shell
-./bin/spark-shell \
- --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste
-sd ',' -` \
- --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-
-# PySpark shell
-./bin/pyspark \
- --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste
-sd ',' -` \
- --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-```
-
-To use the release version of Spark Connect:
-
-```bash
-./bin/spark-shell \
- --packages org.apache.spark:spark-connect_2.12:3.4.0 \
- --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
-```
-
-### Run Tests
-
-```bash
-# Run a single Python class.
-./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_basic'
-```
-
-```bash
-# Run all Spark Connect Python tests as a module.
-./python/run-tests --module pyspark-connect --parallelism 1
-```
-
-
-## Development Topics
-
-### Generate proto generated files for the Python client
-1. Install `buf version 1.11.0`: https://docs.buf.build/installation
-2. Run `pip install grpcio==1.48.1 protobuf==3.19.5 mypy-protobuf==3.3.0
googleapis-common-protos==1.56.4 grpcio-status==1.48.1`
-3. Run `./connector/connect/dev/generate_protos.sh`
-4. Optional Check `./dev/check-codegen-python.py`
-
-### Guidelines for new clients
-
-When contributing a new client please be aware that we strive to have a common
-user experience across all languages. Please follow the below guidelines:
-
-* [Connection string configuration](docs/client-connection-string.md)
-* [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect
protocol.
diff --git a/dev/check-codegen-python.py b/dev/connect-check-protos.py
similarity index 94%
rename from dev/check-codegen-python.py
rename to dev/connect-check-protos.py
index bcb2b0341da..b902274b1f4 100755
--- a/dev/check-codegen-python.py
+++ b/dev/connect-check-protos.py
@@ -46,7 +46,7 @@ def run_cmd(cmd):
def check_connect_protos():
print("Start checking the generated codes in pyspark-connect.")
with tempfile.TemporaryDirectory() as tmp:
- run_cmd(f"{SPARK_HOME}/connector/connect/dev/generate_protos.sh {tmp}")
+ run_cmd(f"{SPARK_HOME}/dev/connect-gen-protos.sh {tmp}")
result = filecmp.dircmp(
f"{SPARK_HOME}/python/pyspark/sql/connect/proto/",
tmp,
@@ -76,7 +76,7 @@ def check_connect_protos():
fail(
"Generated files for pyspark-connect are out of sync! "
"If you have touched files under
connector/connect/src/main/protobuf, "
- "please run ./connector/connect/dev/generate_protos.sh. "
+ "please run ./dev/connect-gen-protos.sh. "
"If you haven't touched any file above, please rebase your PR
against main branch."
)
diff --git a/connector/connect/dev/generate_protos.sh
b/dev/connect-gen-protos.sh
similarity index 96%
rename from connector/connect/dev/generate_protos.sh
rename to dev/connect-gen-protos.sh
index 38cb821a47c..cb5b66379b2 100755
--- a/connector/connect/dev/generate_protos.sh
+++ b/dev/connect-gen-protos.sh
@@ -20,12 +20,12 @@ set -ex
if [[ $# -gt 1 ]]; then
echo "Illegal number of parameters."
- echo "Usage: ./connector/connect/dev/generate_protos.sh [path]"
+ echo "Usage: ./dev/generate_protos.sh [path]"
exit -1
fi
-SPARK_HOME="$(cd "`dirname $0`"/../../..; pwd)"
+SPARK_HOME="$(cd "`dirname $0`"/..; pwd)"
cd "$SPARK_HOME"
diff --git a/python/docs/source/development/contributing.rst
b/python/docs/source/development/contributing.rst
index 88f7b3a7b43..385e7db035d 100644
--- a/python/docs/source/development/contributing.rst
+++ b/python/docs/source/development/contributing.rst
@@ -120,6 +120,8 @@ Prerequisite
PySpark development requires to build Spark that needs a proper JDK installed,
etc. See `Building Spark
<https://spark.apache.org/docs/latest/building-spark.html>`_ for more details.
+Note that if you intend to contribute to Spark Connect in Python, ``buf``
version ``1.11.0`` is required, see `Buf Installation
<https://docs.buf.build/installation>`_ for more details.
+
Conda
~~~~~
diff --git a/python/docs/source/development/testing.rst
b/python/docs/source/development/testing.rst
index 3eab8d04511..0262c318cd6 100644
--- a/python/docs/source/development/testing.rst
+++ b/python/docs/source/development/testing.rst
@@ -25,6 +25,11 @@ In order to run PySpark tests, you should build Spark itself
first via Maven or
build/mvn -DskipTests clean package
+.. code-block:: bash
+
+ build/sbt -Phive clean package
+
+
After that, the PySpark test cases can be run via using ``python/run-tests``.
For example,
.. code-block:: bash
@@ -49,9 +54,54 @@ You can run a specific test via using ``python/run-tests``,
for example, as belo
Please refer to `Testing PySpark
<https://spark.apache.org/developer-tools.html>`_ for more details.
-Running tests using GitHub Actions
+Running Tests using GitHub Actions
----------------------------------
You can run the full PySpark tests by using GitHub Actions in your own forked
GitHub
repository with a few clicks. Please refer to
`Running tests in your forked repository using GitHub Actions
<https://spark.apache.org/developer-tools.html>`_ for more details.
+
+
+Running Tests for Spark Connect
+-------------------------------
+
+Running Tests for Python Client
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to run the tests for Spark Connect in Pyth, you should pass
``--parallelism 1`` option together, for example, as below:
+
+.. code-block:: bash
+
+ python/run-tests --module pyspark-connect --parallelism 1
+
+Note that if you made some changes in Protobuf definitions, for example, at
+`spark/connector/connect/common/src/main/protobuf/spark/connect
<https://github.com/apache/spark/tree/master/connector/connect/common/src/main/protobuf/spark/connect>`_,
+you should regenerate Python Protobuf client by running
``dev/connect-gen-protos.sh``.
+
+
+Running PySpark Shell with Python Client
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To run Spark Connect server you locally built:
+
+.. code-block:: bash
+
+ bin/spark-shell \
+ --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar |
paste -sd ',' -` \
+ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+
+To run the Spark Connect server from the Apache Spark release:
+
+.. code-block:: bash
+
+ bin/spark-shell \
+ --packages org.apache.spark:spark-connect_2.12:3.4.0 \
+ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+
+
+To run the PySpark Shell with the client for the Spark Connect server:
+
+.. code-block:: bash
+
+ bin/pyspark --remote sc://localhost
+
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]