This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 294af6e31639 [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 
4 by default
294af6e31639 is described below

commit 294af6e31639d6f6ac51f961f319866f077b5302
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Sep 16 20:52:28 2024 -0700

    [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to limit `Sphinx` build parallelism to 4 by default for the 
following goals.
    - This will preserve the same speed in GitHub Action environment.
    - This will prevent the exhaustive `SparkSubmit` invocation in large 
machines like `c6i.24xlarge`.
    - The user still can override by providing `SPHINXOPTS`.
    
    ### Why are the changes needed?
    
    `Sphinx` parallelism feature was added via the following on 2024-01-10.
    
    - #44680
    
    However, unfortunately, this breaks Python API doc generation in large 
machines because this means the number of parallel `SparkSubmit` invocation of 
PySpark. In addition, given that each `PySpark` currently is launched with 
`local[*]`, this ends up `N * N` `pyspark.daemon`s.
    
    In other words, as of today, this default setting, `auto`, seems to work on 
low-core machine like `GitHub Action` runners (4 cores). For example, this 
breaks `Python` documentations build even on M3 Max environment and this is 
worse on large EC2 machines (c7i.24xlarge). You can see the failure locally 
like this.
    
    ```
    $ build/sbt package -Phive-thriftserver
    $ cd python/docs
    $ make html
    ...
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
    24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4043. Attempting port 4044.
    24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
    24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
    24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
    24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4043. Attempting port 4044.
    24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4044. Attempting port 4045.
    ...
    java.lang.OutOfMemoryError: Java heap space
    ...
    24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 
177) interrupted: Attempting to kill Python Worker
    ...
    make: *** [html] Error 2
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this is a dev-only change.
    
    ### How was this patch tested?
    
    Pass the CIs and do manual tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #48129 from dongjoon-hyun/SPARK-49680.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/docs/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/docs/Makefile b/python/docs/Makefile
index 5058c1206171..428b0d24b568 100644
--- a/python/docs/Makefile
+++ b/python/docs/Makefile
@@ -16,7 +16,7 @@
 # Minimal makefile for Sphinx documentation
 
 # You can set these variables from the command line.
-SPHINXOPTS    ?= "-W" "-j" "auto"
+SPHINXOPTS    ?= "-W" "-j" "4"
 SPHINXBUILD   ?= sphinx-build
 SOURCEDIR     ?= source
 BUILDDIR      ?= build


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to