This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 12aa1674bce4 [SPARK-55960][INFRA][DOCS][FOLLOW-UP] Document how to
re-generate the protobuf files for python client
12aa1674bce4 is described below
commit 12aa1674bce48184968a0d28c366ef275e67a2d5
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Mar 12 16:59:57 2026 +0800
[SPARK-55960][INFRA][DOCS][FOLLOW-UP] Document how to re-generate the
protobuf files for python client
### What changes were proposed in this pull request?
Document how to re-generate the protobuf files for python client
### Why are the changes needed?
to guide contributors and AI-tools
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #54767 from zhengruifeng/doc_py_cg.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
dev/spark-test-image/connect-gen-protos/Dockerfile | 7 +--
sql/connect/common/src/main/protobuf/README.md | 71 ++++++++++++++++++++++
2 files changed, 72 insertions(+), 6 deletions(-)
diff --git a/dev/spark-test-image/connect-gen-protos/Dockerfile
b/dev/spark-test-image/connect-gen-protos/Dockerfile
index 0e784e532af0..28c072ece678 100644
--- a/dev/spark-test-image/connect-gen-protos/Dockerfile
+++ b/dev/spark-test-image/connect-gen-protos/Dockerfile
@@ -15,12 +15,6 @@
# limitations under the License.
#
-# Usage:
-# 1, Build the image
-# docker build -t connect-cg dev/spark-test-image/connect-gen-protos/
-# 2, Run the image under spark repo
-# docker run -it --rm -v "$(pwd)":/spark connect-cg
-
# Image for generating Spark Connect protobuf files. Based on Ubuntu 24.04.
FROM ubuntu:noble
LABEL org.opencontainers.image.authors="Apache Spark project
<[email protected]>"
@@ -55,6 +49,7 @@ ENV VIRTUAL_ENV=/opt/spark-venv
RUN python3.12 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
+# Keep these versions in sync with the pinned versions in dev/requirements.txt
RUN python3.12 -m pip install \
'mypy==1.19.1' \
'mypy-protobuf==3.3.0' \
diff --git a/sql/connect/common/src/main/protobuf/README.md
b/sql/connect/common/src/main/protobuf/README.md
new file mode 100644
index 000000000000..f2093e6c15c5
--- /dev/null
+++ b/sql/connect/common/src/main/protobuf/README.md
@@ -0,0 +1,71 @@
+# Spark Connect Protobuf Definitions
+
+This directory contains the `.proto` files that define the Spark Connect
protocol.
+
+After modifying any `.proto` file here, regenerate the Python stubs under
+`python/pyspark/sql/connect/proto/` using one of the two methods below.
+
+---
+
+## Method 1: Docker image (recommended)
+
+This method does not require any local tool installation and produces a
+reproducible environment.
+
+### Build the image
+
+```bash
+docker build -t connect-cg dev/spark-test-image/connect-gen-protos/
+```
+
+### Run the image
+
+From the root of the Spark repository:
+
+```bash
+docker run --cpus 1 -it --rm -v "$(pwd)":/spark connect-cg
+```
+
+The container mounts the repository at `/spark`, runs
`dev/connect-gen-protos.sh`
+inside the container, and writes the generated files to
+`python/pyspark/sql/connect/proto/` in your local checkout.
+
+---
+
+## Method 2: Local Python environment
+
+### Prerequisites
+
+Install the required tools:
+
+- [`buf`](https://buf.build/docs/cli/installation/) — protobuf code generator
+- Python 3.12+
+
+Install the required Python packages. Check `dev/requirements.txt` for the
latest
+pinned versions of `mypy`, `mypy-protobuf`, and `black`, then run:
+
+```bash
+pip install 'mypy==<version>' 'mypy-protobuf==<version>' 'black==<version>'
+```
+
+For example, based on the current `dev/requirements.txt`:
+
+```bash
+pip install 'mypy==1.19.1' 'mypy-protobuf==3.3.0' 'black==23.12.1'
+```
+
+### Generate
+
+From the root of the Spark repository:
+
+```bash
+./dev/connect-gen-protos.sh
+```
+
+The generated Python files will be written to
`python/pyspark/sql/connect/proto/`.
+
+You can also generate to a custom output directory by passing a path:
+
+```bash
+./dev/connect-gen-protos.sh /tmp/my-proto-output
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]