[ 
https://issues.apache.org/jira/browse/SPARK-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Al M updated SPARK-24474:
-------------------------
    Description: 
I've observed an issue happening consistently when:
 * A job contains a join of two datasets
 * One dataset is much larger than the other
 * Both datasets require some processing before they are joined

What I have observed is:
 * 2 stages are initially active to run processing on the two datasets
 ** These stages are run in parallel
 ** One stage has significantly more tasks than the other (e.g. one has 30k 
tasks and the other has 2k tasks)
 ** Spark allocates a similar (though not exactly equal) number of cores to 
each stage
 * First stage completes (for the smaller dataset)
 ** Now there is only one stage running
 ** It still has many tasks left (usually > 20k tasks)
 ** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 103)
 ** This continues until the second stage completes
 * Second stage completes, and third begins (the stage that actually joins the 
data)
 ** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
tasks = 200)

Other interesting things about this:
 * It seems that when we have multiple stages active, and one of them finishes, 
it does not actually release any cores to existing stages
 * Once all active stages are done, we release all cores to new stages
 * I can't reproduce this locally on my machine, only on a cluster with YARN 
enabled
 * It happens when dynamic allocation is enabled, and when it is disabled

  was:
I've observed an issue happening consistently when:
 * A job contains a join of two datasets
 * One dataset is much larger than the other
 * Both datasets require some processing before they are joined

What I have observed is:
 * 2 stages are initially active to run processing on the two datasets
 ** These stages are run in parallel
 ** One stage has significantly more tasks than the other (e.g. one has 30k 
tasks and the other has 2k tasks)
 ** Spark allocates a similar (though not exactly equal) number of cores to 
each stage
 * First stage completes (for the smaller dataset)
 ** Now there is only one stage running
 ** It still has many tasks left (usually > 20k tasks)
 ** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 103)
 ** This continues until the second stage completes
 * Second stage completes, and third begins (the stage that actually joins the 
data)
 ** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
tasks = 200)

Other interesting things about this:
 * It seems that when we have multiple stages active, and one of them finishes, 
it does not actually release any cores to existing stages
 * Once all active stages are done, we release all cores to new stages
 * I can't reproduce this locally on my machine, only on a cluster with YARN 
enabled


> Cores are left idle when there are a lot of stages to run
> ---------------------------------------------------------
>
>                 Key: SPARK-24474
>                 URL: https://issues.apache.org/jira/browse/SPARK-24474
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.2.0
>            Reporter: Al M
>            Priority: Major
>
> I've observed an issue happening consistently when:
>  * A job contains a join of two datasets
>  * One dataset is much larger than the other
>  * Both datasets require some processing before they are joined
> What I have observed is:
>  * 2 stages are initially active to run processing on the two datasets
>  ** These stages are run in parallel
>  ** One stage has significantly more tasks than the other (e.g. one has 30k 
> tasks and the other has 2k tasks)
>  ** Spark allocates a similar (though not exactly equal) number of cores to 
> each stage
>  * First stage completes (for the smaller dataset)
>  ** Now there is only one stage running
>  ** It still has many tasks left (usually > 20k tasks)
>  ** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 
> 103)
>  ** This continues until the second stage completes
>  * Second stage completes, and third begins (the stage that actually joins 
> the data)
>  ** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
> tasks = 200)
> Other interesting things about this:
>  * It seems that when we have multiple stages active, and one of them 
> finishes, it does not actually release any cores to existing stages
>  * Once all active stages are done, we release all cores to new stages
>  * I can't reproduce this locally on my machine, only on a cluster with YARN 
> enabled
>  * It happens when dynamic allocation is enabled, and when it is disabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to