[
https://issues.apache.org/jira/browse/SPARK-15176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293009#comment-15293009
]
Nick White commented on SPARK-15176:
------------------------------------
[~kayousterhout] [~irashid] We use Spark to serve interactive queries submitted
by end-users. The data the queries run on is refreshed periodically, and
there's a high IO cost to reading it (as it lives in S3).
We're using the linked PR to support two pools; one serves user queries (and so
always needs hardware resources available for responsiveness) and the other
loads new data into memory as cached RDDs and performs some basic indexing.
When the new data is fully cached it's swapped with the set of RDDs the "query"
pool runs against - so users see no degradation of performance as their queries
never hit uncached data.
Under the existing scheduler implementation, we've seen tasks from the caching
& indexing pool use all up all the hardware resources, and when a user query
arrives the query's tasks have to wait for indexing tasks to finish before they
can start executing (at which point the fair scheduler ensures both the query
and the indexing job make progress).
> Job Scheduling Within Application Suffers from Priority Inversion
> -----------------------------------------------------------------
>
> Key: SPARK-15176
> URL: https://issues.apache.org/jira/browse/SPARK-15176
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 1.6.1
> Reporter: Nick White
>
> Say I have two pools, and N cores in my cluster:
> * I submit a job to one, which has M >> N tasks
> * N of the M tasks are scheduled
> * I submit a job to the second pool - but none of its tasks get scheduled
> until a task from the other pool finishes!
> This can lead to unbounded denial-of-service for the second pool - regardless
> of `minShare` or `weight` settings. Ideally Spark would support a pre-emption
> mechanism, or an upper bound on a pool's resource usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]