[
https://issues.apache.org/jira/browse/SPARK-21598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113040#comment-16113040
]
Eric Vandenberg commented on SPARK-21598:
-----------------------------------------
[~steve_l] Do you have any input / thoughts here? The goal here is to collect
more information than is available in typical metrics. I would like to
directly correlate the replay times with other replay activity attributes like
job size, user impact (ie, was user waiting for a response in real time?), etc.
This is usability more than operational, this information would make it be
easier to target and measure specific improvements to the spark history server
user experience. We often internal users who complain on history server
performance and need a way to directly reference / understand their experience
since spark history server is critical for our internal debugging. If there's
a way to capture this information using metrics alone would like to like to
learn more but from my understanding they aren't designed to capture this level
of information.
> Collect usability/events information from Spark History Server
> --------------------------------------------------------------
>
> Key: SPARK-21598
> URL: https://issues.apache.org/jira/browse/SPARK-21598
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Affects Versions: 2.0.2
> Reporter: Eric Vandenberg
> Priority: Minor
>
> The Spark History Server doesn't currently have a way to collect
> usability/performance on its main activity, loading/replay of history files.
> We'd like to collect this information to monitor, target and measure
> improvements in the spark debugging experience (via history server usage.)
> Once available these usability events could be analyzed using other analytics
> tools.
> The event info to collect:
> SparkHistoryReplayEvent(
> logPath: String,
> logCompressionType: String,
> logReplayException: String // if an error
> logReplayAction: String // user replay, vs checkForLogs replay
> logCompleteFlag: Boolean,
> logFileSize: Long,
> logFileSizeUncompressed: Long,
> logLastModifiedTimestamp: Long,
> logCreationTimestamp: Long,
> logJobId: Long,
> logNumEvents: Int,
> logNumStages: Int,
> logNumTasks: Int
> logReplayDurationMillis: Long
> )
> The main spark engine has a SparkListenerInterface through which all compute
> engine events are broadcast. It probably doesn't make sense to reuse this
> abstraction for broadcasting spark history server events since the "events"
> are not related or compatible with one another. Also note the metrics
> registry collects history caching metrics but doesn't provide the type of
> above information.
> Proposal here would be to add some basic event listener infrastructure to
> capture history server activity events. This would work similar to how the
> SparkListener infrastructure works. It could be configured in a similar
> manner, eg. spark.history.listeners=MyHistoryListenerClass.
> Open to feedback / suggestions / comments on the approach or alternatives.
> cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]