This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 48f6c435b228 [SPARK-46729][DOCS] Withdraw the recommendation of using 
Concurrent Mark Sweep (CMS) Garbage Collector
48f6c435b228 is described below

commit 48f6c435b228f56ab4e0a57d30c49ce075ed49a0
Author: Kent Yao <[email protected]>
AuthorDate: Tue Jan 16 17:43:40 2024 +0800

    [SPARK-46729][DOCS] Withdraw the recommendation of using Concurrent Mark 
Sweep (CMS) Garbage Collector
    
    ### What changes were proposed in this pull request?
    
    JEP 363 removed the CMS garbage collector. So, we are removing the 
recommendation to use it in this PR.
    
    ### Why are the changes needed?
    
    Fix misleading doc
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    doc build
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    no
    
    Closes #44746 from yaooqinn/SPARK-46729.
    
    Authored-by: Kent Yao <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 docs/streaming-programming-guide.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index e5053f1af362..96dd5528aac5 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -2338,10 +2338,6 @@ There are a few parameters that can help you tune the 
memory usage and GC overhe
 * **Clearing old data**: By default, all input data and persisted RDDs 
generated by DStream transformations are automatically cleared. Spark Streaming 
decides when to clear the data based on the transformations that are used. For 
example, if you are using a window operation of 10 minutes, then Spark 
Streaming will keep around the last 10 minutes of data, and actively throw away 
older data.
 Data can be retained for a longer duration (e.g. interactively querying older 
data) by setting `streamingContext.remember`.
 
-* **CMS Garbage Collector**: Use of the concurrent mark-and-sweep GC is 
strongly recommended for keeping GC-related pauses consistently low. Even 
though concurrent GC is known to reduce the
-overall processing throughput of the system, its use is still recommended to 
achieve more
-consistent batch processing times. Make sure you set the CMS GC on both the 
driver (using `--driver-java-options` in `spark-submit`) and the executors 
(using [Spark configuration](configuration.html#runtime-environment) 
`spark.executor.extraJavaOptions`).
-
 * **Other tips**: To further reduce GC overheads, here are some more tips to 
try.
     - Persist RDDs using the `OFF_HEAP` storage level. See more detail in the 
[Spark Programming Guide](rdd-programming-guide.html#rdd-persistence).
     - Use more executors with smaller heap sizes. This will reduce the GC 
pressure within each JVM heap.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to