[
https://issues.apache.org/jira/browse/SPARK-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-3545:
-----------------------------
Component/s: Scheduler
> Put HadoopRDD.getPartitions forward and put TaskScheduler.start back in
> SparkContext to reduce DAGScheduler.JobSubmitted processing time and shorten
> cluster resources occupation period
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-3545
> URL: https://issues.apache.org/jira/browse/SPARK-3545
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Reporter: YanTang Zhai
> Priority: Minor
>
> We have two problems:
> (1) HadoopRDD.getPartitions is lazyied to process in
> DAGScheduler.JobSubmitted. If inputdir is large, getPartitions may spend much
> time.
> For example, in our cluster, it needs from 0.029s to 766.699s. If one
> JobSubmitted event is processing, others should wait. Thus, we
> want to put HadoopRDD.getPartitions forward to reduce
> DAGScheduler.JobSubmitted processing time. Then other JobSubmitted event
> don't
> need to wait much time. HadoopRDD object could get its partitons when it is
> instantiated.
> (2) When SparkContext object is instantiated, TaskScheduler is started and
> some resources are allocated from cluster. However, these
> resources may be not used for the moment. For example,
> DAGScheduler.JobSubmitted is processing and so on. These resources are wasted
> in
> this period. Thus, we want to put TaskScheduler.start back to shorten cluster
> resources occupation period specially for busy cluster.
> TaskScheduler could be started just before running stages.
> We could analyse and compare the execution time before and after optimization.
> TaskScheduler.start execution time: [time1__]
> DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or
> TaskScheduler.start) execution time: [time2_]
> HadoopRDD.getPartitions execution time: [time3___]
> Stages execution time: [time4_____]
> (1) The app has only one job
> (a)
> The execution time of the job before optimization is
> [time1__][time2_][time3___][time4_____].
> The execution time of the job after optimization
> is....[time3___][time2_][time1__][time4_____].
> (b)
> The cluster resources occupation period before optimization is
> [time2_][time3___][time4_____].
> The cluster resources occupation period after optimization is....[time4_____].
> In summary, if the app has only one job, the total execution time is same
> before and after optimization while the cluster resources
> occupation period after optimization is less than before.
> (2) The app has 4 jobs
> (a) Before optimization,
> job1 execution time is [time2_][time3___][time4_____],
> job2 execution time is [time2__________][time3___][time4_____],
> job3 execution time
> is................................[time2____][time3___][time4_____],
> job4 execution time
> is................................[time2______________][time3___][time4_____].
> After optimization,
> job1 execution time is [time3___][time2_][time1__][time4_____],
> job2 execution time is [time3___][time2__________][time4_____],
> job3 execution time
> is................................[time3___][time2_][time4_____],
> job4 execution time
> is................................[time3___][time2__][time4_____].
> In summary, if the app has multiple jobs, average execution time after
> optimization is less than before and the cluster resources
> occupation period after optimization is less than before.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]