Behroz Sikander created SPARK-26302:
---------------------------------------
Summary: retainedBatches configuration can cause memory leak
Key: SPARK-26302
URL: https://issues.apache.org/jira/browse/SPARK-26302
Project: Spark
Issue Type: Improvement
Components: Documentation
Affects Versions: 2.4.0
Reporter: Behroz Sikander
Attachments: heap_dump_detail.png
The documentation for configuration "spark.streaming.ui.retainedBatches" says
"How many batches the Spark Streaming UI and status APIs remember before
garbage collecting"
The default for this configuration is 1000.
>From our experience, the documentation is incomplete and we found it the hard
>way.
The size of a single BatchUIData is around 750KB. Increasing this value to
something like 5000 increases the total size to ~4GB.
If your driver heap is not big enough, the job starts to slow down, has
frequent GCs and has long scheduling days. Once the heap is full, the job
cannot be recovered.
A note of caution should be added to the documentation to let users know the
impact of this seemingly harmless configuration property.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]