This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 12aa1674bce4 [SPARK-55960][INFRA][DOCS][FOLLOW-UP] Document how to 
re-generate the protobuf files for python client
12aa1674bce4 is described below

commit 12aa1674bce48184968a0d28c366ef275e67a2d5
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Mar 12 16:59:57 2026 +0800

    [SPARK-55960][INFRA][DOCS][FOLLOW-UP] Document how to re-generate the 
protobuf files for python client
    
    ### What changes were proposed in this pull request?
    Document how to re-generate the protobuf files for python client
    
    ### Why are the changes needed?
    to guide contributors and AI-tools
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    CI
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #54767 from zhengruifeng/doc_py_cg.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 dev/spark-test-image/connect-gen-protos/Dockerfile |  7 +--
 sql/connect/common/src/main/protobuf/README.md     | 71 ++++++++++++++++++++++
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/dev/spark-test-image/connect-gen-protos/Dockerfile 
b/dev/spark-test-image/connect-gen-protos/Dockerfile
index 0e784e532af0..28c072ece678 100644
--- a/dev/spark-test-image/connect-gen-protos/Dockerfile
+++ b/dev/spark-test-image/connect-gen-protos/Dockerfile
@@ -15,12 +15,6 @@
 # limitations under the License.
 #
 
-# Usage:
-# 1, Build the image
-# docker build -t connect-cg dev/spark-test-image/connect-gen-protos/
-# 2, Run the image under spark repo
-# docker run -it --rm -v "$(pwd)":/spark connect-cg
-
 # Image for generating Spark Connect protobuf files. Based on Ubuntu 24.04.
 FROM ubuntu:noble
 LABEL org.opencontainers.image.authors="Apache Spark project 
<[email protected]>"
@@ -55,6 +49,7 @@ ENV VIRTUAL_ENV=/opt/spark-venv
 RUN python3.12 -m venv $VIRTUAL_ENV
 ENV PATH="$VIRTUAL_ENV/bin:$PATH"
 
+# Keep these versions in sync with the pinned versions in dev/requirements.txt
 RUN python3.12 -m pip install \
     'mypy==1.19.1' \
     'mypy-protobuf==3.3.0' \
diff --git a/sql/connect/common/src/main/protobuf/README.md 
b/sql/connect/common/src/main/protobuf/README.md
new file mode 100644
index 000000000000..f2093e6c15c5
--- /dev/null
+++ b/sql/connect/common/src/main/protobuf/README.md
@@ -0,0 +1,71 @@
+# Spark Connect Protobuf Definitions
+
+This directory contains the `.proto` files that define the Spark Connect 
protocol.
+
+After modifying any `.proto` file here, regenerate the Python stubs under
+`python/pyspark/sql/connect/proto/` using one of the two methods below.
+
+---
+
+## Method 1: Docker image (recommended)
+
+This method does not require any local tool installation and produces a
+reproducible environment.
+
+### Build the image
+
+```bash
+docker build -t connect-cg dev/spark-test-image/connect-gen-protos/
+```
+
+### Run the image
+
+From the root of the Spark repository:
+
+```bash
+docker run --cpus 1 -it --rm -v "$(pwd)":/spark connect-cg
+```
+
+The container mounts the repository at `/spark`, runs 
`dev/connect-gen-protos.sh`
+inside the container, and writes the generated files to
+`python/pyspark/sql/connect/proto/` in your local checkout.
+
+---
+
+## Method 2: Local Python environment
+
+### Prerequisites
+
+Install the required tools:
+
+- [`buf`](https://buf.build/docs/cli/installation/) — protobuf code generator
+- Python 3.12+
+
+Install the required Python packages. Check `dev/requirements.txt` for the 
latest
+pinned versions of `mypy`, `mypy-protobuf`, and `black`, then run:
+
+```bash
+pip install 'mypy==<version>' 'mypy-protobuf==<version>' 'black==<version>'
+```
+
+For example, based on the current `dev/requirements.txt`:
+
+```bash
+pip install 'mypy==1.19.1' 'mypy-protobuf==3.3.0' 'black==23.12.1'
+```
+
+### Generate
+
+From the root of the Spark repository:
+
+```bash
+./dev/connect-gen-protos.sh
+```
+
+The generated Python files will be written to 
`python/pyspark/sql/connect/proto/`.
+
+You can also generate to a custom output directory by passing a path:
+
+```bash
+./dev/connect-gen-protos.sh /tmp/my-proto-output
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to