I was recently debugging an OOM exception one of my coworkers was struggling
with and found that `SQLListener._stageIdToStageMetrics` was the culprit.
The UI was disabled in this case, but stats were still accumulating for
jobs, stages, and tasks. The job my coworker was running had over 40k tasks
in one of the stages. Does it make sense to set different defaults for the
following settings when the UI is disabled?

spark.sql.ui.retainedExecutions
spark.ui.retainedJobs
spark.ui.retainedStages
spark.ui.retainedTasks

There may be some other configuration settings that should change too; but
at a minimum, these settings are all potentially problematic as they can
grow unbounded. Is there a reason these settings are using their default
values even when the UI is disabled? If not, it seems like we could save
users a lot of headaches by setting these values to 0 when the UI is
disabled.

Moreover, how does this work with streaming? It seems like this problem
would come up quite often.

Thanks,
Craig



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to