[jira] [Updated] (SPARK-13181) Spark delay in task scheduling within executor

Prabhu Joseph (JIRA) Wed, 03 Feb 2016 21:45:07 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prabhu Joseph updated SPARK-13181:
----------------------------------
    Attachment: ran3.JPG

> Spark delay in task scheduling within executor
> ----------------------------------------------
>
>                 Key: SPARK-13181
>                 URL: https://issues.apache.org/jira/browse/SPARK-13181
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.2
>            Reporter: Prabhu Joseph
>             Fix For: 1.5.2
>
>         Attachments: ran3.JPG
>
>
> When Spark job with some RDD in memory and some in Hadoop, the tasks within 
> Executor which reads from memory is started parallel but task to read from 
> hadoop is started after some delay.
> Repro: 
>     A logFile of 1.25 GB is given as input. (5 RDD each of 256MB) 
>     val logData = sc.textFile(logFile, 2).cache()
>     var numAs = logData.filter(line => line.contains("a")).count()
>     var numBs = logData.filter(line => line.contains("b")).count()
> Run Spark Job with 1 executor with 6GB memory, 12 cores
> Stage A (reading line with a) - executor starts 5 tasks parallel and all 
> reads data from Hadoop.
> Stage B(reading line with b) - As the data is cached (4 RDD is in memory, 1 
> is in Hadoop) - executor starts 4 tasks parallel and after 4 seconds delay, 
> starts the last task to read from Hadoop.
> On Running the same Spark Job with 12GB memory, all 5 RDD are in memory ans 5 
> tasks in Stage B started parallel. 
> On Running the job with 2GB memory, all 5 RDD are in Hadoop and 5 tasks in 
> stage B started parallel. 
> The task delay happens only when some RDD in memory and some in Hadoop.
> Check the attached image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13181) Spark delay in task scheduling within executor

Reply via email to