Re: Fair scheduler pool leak

Mark Hamstra Sat, 07 Apr 2018 08:51:06 -0700

Sorry, but I'm still not understanding this use case. Are you somehow
creating additional scheduling pools dynamically as Jobs execute? If so,
that is a very unusual thing to do. Scheduling pools are intended to be
statically configured -- initialized, living and dying with the
Application.


On Sat, Apr 7, 2018 at 12:33 AM, Matthias Boehm <[email protected]> wrote:

> Thanks for the clarification Imran - that helped. I was mistakenly
> assuming that these pools are removed via weak references, as the
> ContextCleaner does for RDDs, broadcasts, and accumulators, etc. For
> the time being, we'll just work around it, but I'll file a
> nice-to-have improvement JIRA. Also, you're right, we see indeed these
> warnings but they're usually hidden when running with ERROR or INFO
> (due to overwhelming output) log levels.
>
> Just to give the context: We use these scheduler pools in SystemML's
> parallel for loop construct (parfor), which allows combining data- and
> task-parallel computation. If the data fits into the remote memory
> budget, the optimizer may decide to execute the entire loop as a
> single spark job (with groups of iterations mapped to spark tasks). If
> the data is too large and non-partitionable, the parfor loop is
> executed as a multi-threaded operator in the driver and each worker
> might spawn several data-parallel spark jobs in the context of the
> worker's scheduler pool, for operations that don't fit into the
> driver.
>
> We decided to use these fair scheduler pools (w/ fair scheduling
> across pools, FIFO per pool) instead of the default FIFO scheduler
> because it gave us better and more robust performance back in the
> Spark 1.x line. This was especially true for concurrent jobs over
> shared input data (e.g., for hyper parameter tuning) and when the data
> size exceeded aggregate memory. The only downside was that we had to
> guard against scenarios where concurrently jobs would lazily pull a
> shared RDD into cache because that lead to thread contention on the
> executors' block managers and spurious replicated in-memory
> partitions.
>
> Regards,
> Matthias
>
> On Fri, Apr 6, 2018 at 8:08 AM, Imran Rashid <[email protected]> wrote:
> > Hi Matthias,
> >
> > This doeesn't look possible now.  It may be worth filing an improvement
> jira
> > for.
> >
> > But I'm trying to understand what you're trying to do a little better.
> So
> > you intentionally have each thread create a new unique pool when its
> submits
> > a job?  So that pool will just get the default pool configuration, and
> you
> > will see lots of these messages in your logs?
> >
> > https://github.com/apache/spark/blob/6ade5cbb498f6c6ea38779b97f2325
> d5cf5013f2/core/src/main/scala/org/apache/spark/
> scheduler/SchedulableBuilder.scala#L196-L200
> >
> > What is the use case for creating pools this way?
> >
> > Also if I understand correctly, it doesn't even matter if the thread
> dies --
> > that pool will still stay around, as the rootPool will retain a
> reference to
> > its (the pools aren't really actually tied to specific threads).
> >
> > Imran
> >
> > On Thu, Apr 5, 2018 at 9:46 PM, Matthias Boehm <[email protected]>
> wrote:
> >>
> >> Hi all,
> >>
> >> for concurrent Spark jobs spawned from the driver, we use Spark's fair
> >> scheduler pools, which are set and unset in a thread-local manner by
> >> each worker thread. Typically (for rather long jobs), this works very
> >> well. Unfortunately, in an application with lots of very short
> >> parallel sections, we see 1000s of these pools remaining in the Spark
> >> UI, which indicates some kind of leak. Each worker cleans up its local
> >> property by setting it to null, but not all pools are properly
> >> removed. I've checked and reproduced this behavior with Spark 2.1-2.3.
> >>
> >> Now my question: Is there a way to explicitly remove these pools,
> >> either globally, or locally while the thread is still alive?
> >>
> >> Regards,
> >> Matthias
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: [email protected]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Fair scheduler pool leak

Reply via email to