[
https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618213#comment-16618213
]
Rong Tang commented on SPARK-25409:
-----------------------------------
Create a pull request for it. [https://github.com/apache/spark/pull/22444]
> Speed up Spark History at start if there are tens of thousands of
> applications.
> -------------------------------------------------------------------------------
>
> Key: SPARK-25409
> URL: https://issues.apache.org/jira/browse/SPARK-25409
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.3.1
> Reporter: Rong Tang
> Priority: Major
> Attachments: SPARK-25409.0001.patch
>
>
> We have a spark history server, storing 7 days' applications. it usually has
> 10K to 20K attempts.
> We found that it can take hours at start up,loading/replaying the logs in
> event-logs folder. thus, new finished applications have to wait several
> hours to be seem. So I made 2 improvements for it.
> # As we run spark on yarn. the on-going applications' information can also
> be seen via resource manager, so I introduce in a flag
> spark.history.fs.load.incomplete to say loading logs for incomplete attempts
> or not.
> # Incremental loading applications. as I said, we have more then 10K
> applications stored, it can take hours to load all of them at the first time.
> so I introduced in a config spark.history.fs.appRefreshNum to say how many
> application to load each time, then it gets a chance the check the latest
> updates.
> Here are the benchmark I did. our system has 1K incomplete application ( it
> was not cleaned up for some reason, that is another issue that I need
> investigate), and applications' log size can be gigabytes.
>
> Not load incomplete attempts.
> | |Load Count|Load incomplete APPs|All attempts number|Time Cost|Increase
> with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|All|No|13K|31 minutes| yes|
>
>
> Limit each time how much to load.
>
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Increase
> with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|3000|Yes|13K|42 minutes except last 1.6K
> (The last 1.6K attempts cost extremely long 2.5 hours)|NO|
>
>
> Limit each time how many to load, and not load incomplete jobs.
>
> | |Load Count|Load incomplete APPs|All attempts number|Worst
> Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes| |Yes|
> |2|3000|NO|12K|17minutes
> |10 minutes
> ( 41 minutes in total)|NO|
>
>
> | |Load Count|Load incomplete APPs|All attempts number|Worst
> Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|20K|1 hour 52 minutes| |Yes|
> |2|3000|NO|18.5K|20minutes|18 minutes
> (2 hours 18 minutes in total)
> |NO|
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]