Vishal Gupta created SPARK-13083:
------------------------------------

             Summary: Small spark sql queries get bloced if there is a long 
running query over a lot a partitions
                 Key: SPARK-13083
                 URL: https://issues.apache.org/jira/browse/SPARK-13083
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.5.1
            Reporter: Vishal Gupta


Steps to reproduce :
a) Run first query doing count(*) over a lot of paritions ( ~4500 partitions ) 
in s3.
b) The spark-job for the first query starts running.
c) Run second query "show tables"  to the same spark-application. ( i did it 
using zeppellin ) 
d) As soon as the second query "show tables" is submitted, it starts showing up 
in the "Spark Application UI" > "SQL".
e) At this point there is only one active job running in the application which 
corresponds to the first query.
f) Only after the job for the first query is near completion, the job for "show 
tables" starts appearing in "Spark Application UI" > "Jobs". 
g) As soon as the job for "show tables" starts, it completes very fast and 
gives the results.

Sometime step (c) has to performed after 1-2 minutes of execution of the 
long-running-query. But after this point, jobs do not get started for any 
number of smaller queries submitted to the spark-application till the 
long-running-query is near execution. 

They seem to be blocked on the long-running query. Ideally, they should have 
started running as the all settings are for fair-scheduler.

I am running spark-1.5.1. In addtion to it, I have the following configs :
spark.scheduler.mode FAIR
spark.scheduler.allocation.file /usr/lib/spark/conf/fairscheduler.xml

/usr/lib/spark/conf/fairscheduler.xml has the following contents 
<?xml version="1.0"?>
<allocations>
  <pool name="default">
      <schedulingMode>FAIR</schedulingMode>
   </pool>
 </allocations>






I run a query doing count(*) over a lot of paritions ( ~4500 partitions ) in 
s3. In the same spark-application using zepellin, when I run a "show tables" to 
the same spark-application it does not start till the first query come very 
near completion.

I submit the second query via zepellin to the same spark application. The "SQL" 
tab in the spark-application-UI starts showing "show tables". But the job 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to