[jira] [Updated] (SPARK-13083) Small spark sql queries get bloced if there is a long running query over a lot a partitions

Vishal Gupta (JIRA) Fri, 29 Jan 2016 02:26:14 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vishal Gupta updated SPARK-13083:
---------------------------------
    Description: 
Steps to reproduce :
a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) 
in s3.
b) The spark-job for the first query starts running.
c) Run second query "show tables"  to the same spark-application. ( i did it 
using zeppellin ) 
d) As soon as the second query "show tables" is submitted, it starts showing up 
in the "Spark Application UI" > "SQL".
e) At this point there is only one active job running in the application which 
corresponds to the first query.
f) Only after the job for the first query is near completion, the job for "show 
tables" starts appearing in "Spark Application UI" > "Jobs". 
g) As soon as the job for "show tables" starts, it completes very fast and 
gives the results.

Sometime step (c) has to performed after 1-2 minutes of execution of the 
long-running-query. But after this point, jobs do not get started for any 
number of smaller queries submitted to the spark-application till the 
long-running-query is near execution. 

They seem to be blocked on the long-running query. Ideally, they should have 
started running as the all settings are for fair-scheduler.

I am running spark-1.5.1. In addtion to it, I have the following configs :
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml

/usr/lib/spark/conf/fairscheduler.xml has the following contents 
{code}
<?xml version="1.0"?>
<allocations>
  <pool name="default">
      <schedulingMode>FAIR</schedulingMode>
   </pool>
 </allocations>
{code}





I run a query doing count(*) over a lot of paritions ( ~4500 partitions ) in 
s3. In the same spark-application using zepellin, when I run a "show tables" to 
the same spark-application it does not start till the first query come very 
near completion.

I submit the second query via zepellin to the same spark application. The "SQL" 
tab in the spark-application-UI starts showing "show tables". But the job 

  was:
Steps to reproduce :
a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) 
in s3.
b) The spark-job for the first query starts running.
c) Run second query "show tables"  to the same spark-application. ( i did it 
using zeppellin ) 
d) As soon as the second query "show tables" is submitted, it starts showing up 
in the "Spark Application UI" > "SQL".
e) At this point there is only one active job running in the application which 
corresponds to the first query.
f) Only after the job for the first query is near completion, the job for "show 
tables" starts appearing in "Spark Application UI" > "Jobs". 
g) As soon as the job for "show tables" starts, it completes very fast and 
gives the results.

Sometime step (c) has to performed after 1-2 minutes of execution of the 
long-running-query. But after this point, jobs do not get started for any 
number of smaller queries submitted to the spark-application till the 
long-running-query is near execution. 

They seem to be blocked on the long-running query. Ideally, they should have 
started running as the all settings are for fair-scheduler.

I am running spark-1.5.1. In addtion to it, I have the following configs :
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml

/usr/lib/spark/conf/fairscheduler.xml has the following contents 
<?xml version="1.0"?>
<allocations>
  <pool name="default">
      <schedulingMode>FAIR</schedulingMode>
   </pool>
 </allocations>






I run a query doing count(*) over a lot of paritions ( ~4500 partitions ) in 
s3. In the same spark-application using zepellin, when I run a "show tables" to 
the same spark-application it does not start till the first query come very 
near completion.

I submit the second query via zepellin to the same spark application. The "SQL" 
tab in the spark-application-UI starts showing "show tables". But the job 


> Small spark sql queries get bloced if there is a long running query over a 
> lot a partitions
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13083
>                 URL: https://issues.apache.org/jira/browse/SPARK-13083
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Vishal Gupta
>
> Steps to reproduce :
> a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions 
> ) in s3.
> b) The spark-job for the first query starts running.
> c) Run second query "show tables"  to the same spark-application. ( i did it 
> using zeppellin ) 
> d) As soon as the second query "show tables" is submitted, it starts showing 
> up in the "Spark Application UI" > "SQL".
> e) At this point there is only one active job running in the application 
> which corresponds to the first query.
> f) Only after the job for the first query is near completion, the job for 
> "show tables" starts appearing in "Spark Application UI" > "Jobs". 
> g) As soon as the job for "show tables" starts, it completes very fast and 
> gives the results.
> Sometime step (c) has to performed after 1-2 minutes of execution of the 
> long-running-query. But after this point, jobs do not get started for any 
> number of smaller queries submitted to the spark-application till the 
> long-running-query is near execution. 
> They seem to be blocked on the long-running query. Ideally, they should have 
> started running as the all settings are for fair-scheduler.
> I am running spark-1.5.1. In addtion to it, I have the following configs :
> spark.scheduler.mode FAIR
> spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml
> /usr/lib/spark/conf/fairscheduler.xml has the following contents 
> {code}
> <?xml version="1.0"?>
> <allocations>
>   <pool name="default">
>       <schedulingMode>FAIR</schedulingMode>
>    </pool>
>  </allocations>
> {code}
> I run a query doing count(*) over a lot of paritions ( ~4500 partitions ) in 
> s3. In the same spark-application using zepellin, when I run a "show tables" 
> to the same spark-application it does not start till the first query come 
> very near completion.
> I submit the second query via zepellin to the same spark application. The 
> "SQL" tab in the spark-application-UI starts showing "show tables". But the 
> job 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13083) Small spark sql queries get bloced if there is a long running query over a lot a partitions

Reply via email to