This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 48f6c435b228 [SPARK-46729][DOCS] Withdraw the recommendation of using
Concurrent Mark Sweep (CMS) Garbage Collector
48f6c435b228 is described below
commit 48f6c435b228f56ab4e0a57d30c49ce075ed49a0
Author: Kent Yao <[email protected]>
AuthorDate: Tue Jan 16 17:43:40 2024 +0800
[SPARK-46729][DOCS] Withdraw the recommendation of using Concurrent Mark
Sweep (CMS) Garbage Collector
### What changes were proposed in this pull request?
JEP 363 removed the CMS garbage collector. So, we are removing the
recommendation to use it in this PR.
### Why are the changes needed?
Fix misleading doc
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
doc build
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #44746 from yaooqinn/SPARK-46729.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
docs/streaming-programming-guide.md | 4 ----
1 file changed, 4 deletions(-)
diff --git a/docs/streaming-programming-guide.md
b/docs/streaming-programming-guide.md
index e5053f1af362..96dd5528aac5 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -2338,10 +2338,6 @@ There are a few parameters that can help you tune the
memory usage and GC overhe
* **Clearing old data**: By default, all input data and persisted RDDs
generated by DStream transformations are automatically cleared. Spark Streaming
decides when to clear the data based on the transformations that are used. For
example, if you are using a window operation of 10 minutes, then Spark
Streaming will keep around the last 10 minutes of data, and actively throw away
older data.
Data can be retained for a longer duration (e.g. interactively querying older
data) by setting `streamingContext.remember`.
-* **CMS Garbage Collector**: Use of the concurrent mark-and-sweep GC is
strongly recommended for keeping GC-related pauses consistently low. Even
though concurrent GC is known to reduce the
-overall processing throughput of the system, its use is still recommended to
achieve more
-consistent batch processing times. Make sure you set the CMS GC on both the
driver (using `--driver-java-options` in `spark-submit`) and the executors
(using [Spark configuration](configuration.html#runtime-environment)
`spark.executor.extraJavaOptions`).
-
* **Other tips**: To further reduce GC overheads, here are some more tips to
try.
- Persist RDDs using the `OFF_HEAP` storage level. See more detail in the
[Spark Programming Guide](rdd-programming-guide.html#rdd-persistence).
- Use more executors with smaller heap sizes. This will reduce the GC
pressure within each JVM heap.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]