[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-26302: ---------------------------------- Affects Version/s: (was: 2.4.0) 3.0.0 > retainedBatches configuration can eat up memory on driver > --------------------------------------------------------- > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation, DStreams > Affects Versions: 3.0.0 > Reporter: Behroz Sikander > Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org