[jira] [Commented] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

Michael Armbrust (JIRA) Mon, 01 Feb 2016 12:00:43 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126905#comment-15126905
 ]


Michael Armbrust commented on SPARK-13083:
------------------------------------------

You need to also ensure the queries are running in different pools if you want 
them to get a fair share of the resources.

http://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools

> Small spark sql queries get blocked if there is a long running query over a 
> lot a partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13083
>                 URL: https://issues.apache.org/jira/browse/SPARK-13083
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Vishal Gupta
>
> Steps to reproduce :
> a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions 
> ) in s3.
> b) The spark-job for the first query starts running.
> c) Run second query "show tables"  to the same spark-application. ( i did it 
> using zeppellin ) 
> d) As soon as the second query "show tables" is submitted, it starts showing 
> up in the "Spark Application UI" > "SQL".
> e) At this point there is only one active job running in the application 
> which corresponds to the first query.
> f) Only after the job for the first query is near completion, the job for 
> "show tables" starts appearing in "Spark Application UI" > "Jobs". 
> g) As soon as the job for "show tables" starts, it completes very fast and 
> gives the results.
> Sometime step (c) has to performed after 1-2 minutes of execution of the 
> long-running-query. But after this point, jobs do not get started for any 
> number of smaller queries submitted to the spark-application till the 
> long-running-query is near execution. 
> They seem to be blocked on the long-running query. Ideally, they should have 
> started running as the all settings are for fair-scheduler.
> I am running spark-1.5.1. In addtion to it, I have the following configs :
> {code}
> spark.scheduler.mode FAIR
> spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
> {code}
> /usr/lib/spark/conf/fairscheduler.xml has the following contents 
> {code}
> <?xml version="1.0"?>
> <allocations>
>   <pool name="default">
>       <schedulingMode>FAIR</schedulingMode>
>    </pool>
>  </allocations>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

Reply via email to